Overview

Brought to you by YData

Dataset statistics

Number of variables42
Number of observations196294
Missing cells0
Missing cells (%)0.0%
Duplicate rows122
Duplicate rows (%)0.1%
Total size in memory64.4 MiB
Average record size in memory344.0 B

Variable types

Numeric8
Categorical33
Text1

Alerts

Dataset has 122 (0.1%) duplicate rowsDuplicates
age is highly overall correlated with detailed_household_and_family_stat and 2 other fieldsHigh correlation
citizenship is highly overall correlated with country_of_birth_father and 2 other fieldsHigh correlation
class_of_worker is highly overall correlated with detailed_industry_recode and 4 other fieldsHigh correlation
country_of_birth_father is highly overall correlated with citizenship and 3 other fieldsHigh correlation
country_of_birth_mother is highly overall correlated with citizenship and 3 other fieldsHigh correlation
country_of_birth_self is highly overall correlated with citizenship and 2 other fieldsHigh correlation
detailed_household_and_family_stat is highly overall correlated with age and 4 other fieldsHigh correlation
detailed_household_summary_in_household is highly overall correlated with detailed_household_and_family_stat and 3 other fieldsHigh correlation
detailed_industry_recode is highly overall correlated with class_of_worker and 2 other fieldsHigh correlation
detailed_occupation_recode is highly overall correlated with class_of_worker and 2 other fieldsHigh correlation
education is highly overall correlated with tax_filer_stat and 1 other fieldsHigh correlation
family_members_under_18 is highly overall correlated with detailed_household_and_family_stat and 3 other fieldsHigh correlation
fill_inc_questionnaire_for_veteran's_admin is highly overall correlated with veterans_benefitsHigh correlation
full_or_part_time_employment_stat is highly overall correlated with live_in_this_house_1_year_ago and 2 other fieldsHigh correlation
hispanic_origin is highly overall correlated with country_of_birth_father and 1 other fieldsHigh correlation
live_in_this_house_1_year_ago is highly overall correlated with full_or_part_time_employment_stat and 6 other fieldsHigh correlation
major_industry_code is highly overall correlated with class_of_worker and 3 other fieldsHigh correlation
major_occupation_code is highly overall correlated with class_of_worker and 3 other fieldsHigh correlation
marital_stat is highly overall correlated with tax_filer_statHigh correlation
migration_code_change_in_msa is highly overall correlated with live_in_this_house_1_year_ago and 5 other fieldsHigh correlation
migration_code_change_in_reg is highly overall correlated with full_or_part_time_employment_stat and 4 other fieldsHigh correlation
migration_code_move_within_reg is highly overall correlated with live_in_this_house_1_year_ago and 5 other fieldsHigh correlation
migration_prev_res_in_sunbelt is highly overall correlated with live_in_this_house_1_year_ago and 3 other fieldsHigh correlation
num_persons_worked_for_employer is highly overall correlated with class_of_worker and 2 other fieldsHigh correlation
region_of_previous_residence is highly overall correlated with live_in_this_house_1_year_ago and 3 other fieldsHigh correlation
tax_filer_stat is highly overall correlated with age and 8 other fieldsHigh correlation
veterans_benefits is highly overall correlated with age and 6 other fieldsHigh correlation
weeks_worked_in_year is highly overall correlated with num_persons_worked_for_employer and 1 other fieldsHigh correlation
year is highly overall correlated with full_or_part_time_employment_stat and 4 other fieldsHigh correlation
enroll_in_edu_inst_last_wk is highly imbalanced (74.4%) Imbalance
race is highly imbalanced (62.0%) Imbalance
hispanic_origin is highly imbalanced (71.5%) Imbalance
member_of_a_labor_union is highly imbalanced (67.1%) Imbalance
reason_for_unemployment is highly imbalanced (89.3%) Imbalance
region_of_previous_residence is highly imbalanced (77.9%) Imbalance
migration_code_move_within_reg is highly imbalanced (54.4%) Imbalance
migration_prev_res_in_sunbelt is highly imbalanced (69.8%) Imbalance
family_members_under_18 is highly imbalanced (50.4%) Imbalance
country_of_birth_father is highly imbalanced (70.7%) Imbalance
country_of_birth_mother is highly imbalanced (71.4%) Imbalance
country_of_birth_self is highly imbalanced (81.6%) Imbalance
citizenship is highly imbalanced (65.3%) Imbalance
own_business_or_self_employed is highly imbalanced (67.6%) Imbalance
fill_inc_questionnaire_for_veteran's_admin is highly imbalanced (94.4%) Imbalance
target is highly imbalanced (66.0%) Imbalance
dividends_from_stocks is highly skewed (γ1 = 27.56720148) Skewed
age has 2643 (1.3%) zeros Zeros
wage_per_hour has 184991 (94.2%) zeros Zeros
capital_gains has 188915 (96.2%) zeros Zeros
capital_losses has 192388 (98.0%) zeros Zeros
dividends_from_stocks has 175156 (89.2%) zeros Zeros
num_persons_worked_for_employer has 92770 (47.3%) zeros Zeros
weeks_worked_in_year has 92770 (47.3%) zeros Zeros

Reproduction

Analysis started2025-01-20 00:36:58.941612
Analysis finished2025-01-20 00:37:53.710621
Duration54.77 seconds
Software versionydata-profiling vv4.12.1
Download configurationconfig.json

Variables

age
Real number (ℝ)

High correlation  Zeros 

Distinct91
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.929468
Minimum0
Maximum90
Zeros2643
Zeros (%)1.3%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2025-01-19T18:37:53.822093image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q116
median34
Q350
95-th percentile75
Maximum90
Range90
Interquartile range (IQR)34

Descriptive statistics

Standard deviation22.210001
Coefficient of variation (CV)0.63585282
Kurtosis-0.72745803
Mean34.929468
Median Absolute Deviation (MAD)17
Skewness0.35720878
Sum6856445
Variance493.28413
MonotonicityNot monotonic
2025-01-19T18:37:54.011094image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34 3486
 
1.8%
35 3450
 
1.8%
36 3352
 
1.7%
31 3349
 
1.7%
33 3340
 
1.7%
37 3278
 
1.7%
38 3277
 
1.7%
30 3202
 
1.6%
32 3187
 
1.6%
39 3144
 
1.6%
Other values (81) 163229
83.2%
ValueCountFrequency (%)
0 2643
1.3%
1 2954
1.5%
2 3031
1.5%
3 3059
1.6%
4 3108
1.6%
5 3090
1.6%
6 3014
1.5%
7 2980
1.5%
8 3004
1.5%
9 2941
1.5%
ValueCountFrequency (%)
90 722
0.4%
89 195
 
0.1%
88 241
 
0.1%
87 301
0.2%
86 348
0.2%
85 423
0.2%
84 519
0.3%
83 561
0.3%
82 614
0.3%
81 718
0.4%

class_of_worker
Categorical

High correlation 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
97029 
Private sector
72021 
Government
14935 
Self-employed
11706 
Not employed
 
603

Length

Max length15
Median length14
Mean length14.124186
Min length10

Characters and Unicode

Total characters2772493
Distinct characters23
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowSelf-employed
3rd rowNot in universe
4th rowNot in universe
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not in universe 97029
49.4%
Private sector 72021
36.7%
Government 14935
 
7.6%
Self-employed 11706
 
6.0%
Not employed 603
 
0.3%

Length

2025-01-19T18:37:54.166753image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:54.311556image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 97632
21.1%
in 97029
21.0%
universe 97029
21.0%
private 72021
15.6%
sector 72021
15.6%
government 14935
 
3.2%
self-employed 11706
 
2.5%
employed 603
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 404294
14.6%
266682
9.6%
i 266079
9.6%
t 256609
9.3%
r 256006
9.2%
n 223928
8.1%
o 196897
7.1%
v 183985
6.6%
s 169050
 
6.1%
N 97632
 
3.5%
Other values (13) 451331
16.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2297811
82.9%
Space Separator 266682
 
9.6%
Uppercase Letter 196294
 
7.1%
Dash Punctuation 11706
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 404294
17.6%
i 266079
11.6%
t 256609
11.2%
r 256006
11.1%
n 223928
9.7%
o 196897
8.6%
v 183985
8.0%
s 169050
7.4%
u 97029
 
4.2%
c 72021
 
3.1%
Other values (7) 171913
7.5%
Uppercase Letter
ValueCountFrequency (%)
N 97632
49.7%
P 72021
36.7%
G 14935
 
7.6%
S 11706
 
6.0%
Space Separator
ValueCountFrequency (%)
266682
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11706
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2494105
90.0%
Common 278388
 
10.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 404294
16.2%
i 266079
10.7%
t 256609
10.3%
r 256006
10.3%
n 223928
9.0%
o 196897
7.9%
v 183985
7.4%
s 169050
6.8%
N 97632
 
3.9%
u 97029
 
3.9%
Other values (11) 342596
13.7%
Common
ValueCountFrequency (%)
266682
95.8%
- 11706
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2772493
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 404294
14.6%
266682
9.6%
i 266079
9.6%
t 256609
9.3%
r 256006
9.2%
n 223928
8.1%
o 196897
7.1%
v 183985
6.6%
s 169050
 
6.1%
N 97632
 
3.5%
Other values (13) 451331
16.3%

detailed_industry_recode
Categorical

High correlation 

Distinct42
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe or children
97467 
Public administration
17521 
Manufacturing
 
9367
Manufacturing-durable goods
 
5984
Business and repair services
 
5973
Other values (37)
59982 

Length

Max length58
Median length27
Mean length24.836225
Min length5

Characters and Unicode

Total characters4875202
Distinct characters39
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe or children
2nd rowManufacturing-durable goods
3rd rowNot in universe or children
4th rowNot in universe or children
5th rowNot in universe or children

Common Values

ValueCountFrequency (%)
Not in universe or children 97467
49.7%
Public administration 17521
 
8.9%
Manufacturing 9367
 
4.8%
Manufacturing-durable goods 5984
 
3.0%
Business and repair services 5973
 
3.0%
Public administration and armed forces 4683
 
2.4%
Wholesale and retail trade 4648
 
2.4%
Professional services 4616
 
2.4%
Trade 4482
 
2.3%
Professional and related services 3889
 
2.0%
Other values (32) 37664
 
19.2%

Length

2025-01-19T18:37:54.489590image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not 99171
13.6%
or 97467
13.3%
children 97467
13.3%
in 97467
13.3%
universe 97467
13.3%
services 27917
 
3.8%
and 27058
 
3.7%
public 23275
 
3.2%
administration 22204
 
3.0%
trade 10605
 
1.5%
Other values (45) 131263
17.9%

Most occurring characters

ValueCountFrequency (%)
535067
11.0%
i 514356
10.6%
n 480741
9.9%
e 473536
 
9.7%
r 467280
 
9.6%
o 294803
 
6.0%
s 272236
 
5.6%
t 227261
 
4.7%
a 210547
 
4.3%
u 204908
 
4.2%
Other values (29) 1194467
24.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4111773
84.3%
Space Separator 535067
 
11.0%
Uppercase Letter 199151
 
4.1%
Other Punctuation 16529
 
0.3%
Dash Punctuation 12682
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 514356
12.5%
n 480741
11.7%
e 473536
11.5%
r 467280
11.4%
o 294803
 
7.2%
s 272236
 
6.6%
t 227261
 
5.5%
a 210547
 
5.1%
u 204908
 
5.0%
c 197282
 
4.8%
Other values (11) 768823
18.7%
Uppercase Letter
ValueCountFrequency (%)
N 99171
49.8%
P 35429
 
17.8%
M 25814
 
13.0%
B 8910
 
4.5%
T 8654
 
4.3%
W 6592
 
3.3%
H 4045
 
2.0%
F 2119
 
1.1%
E 2077
 
1.0%
S 1644
 
0.8%
Other values (5) 4696
 
2.4%
Space Separator
ValueCountFrequency (%)
535067
100.0%
Other Punctuation
ValueCountFrequency (%)
, 16529
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 12682
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4310924
88.4%
Common 564278
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 514356
11.9%
n 480741
11.2%
e 473536
11.0%
r 467280
10.8%
o 294803
 
6.8%
s 272236
 
6.3%
t 227261
 
5.3%
a 210547
 
4.9%
u 204908
 
4.8%
c 197282
 
4.6%
Other values (26) 967974
22.5%
Common
ValueCountFrequency (%)
535067
94.8%
, 16529
 
2.9%
- 12682
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4875202
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
535067
11.0%
i 514356
10.6%
n 480741
9.9%
e 473536
 
9.7%
r 467280
 
9.6%
o 294803
 
6.0%
s 272236
 
5.6%
t 227261
 
4.7%
a 210547
 
4.3%
u 204908
 
4.2%
Other values (29) 1194467
24.5%

detailed_occupation_recode
Categorical

High correlation 

Distinct47
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
97467 
Other executive, admin and managerial
 
8756
Food service occupations
 
7886
Computer equipment operators
 
5412
Personal service occupations
 
5105
Other values (42)
71668 

Length

Max length46
Median length43
Mean length23.385646
Min length9

Characters and Unicode

Total characters4590462
Distinct characters40
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowAutomobile mechanics and repairers
3rd rowNot in universe
4th rowNot in universe
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not in universe 97467
49.7%
Other executive, admin and managerial 8756
 
4.5%
Food service occupations 7886
 
4.0%
Computer equipment operators 5412
 
2.8%
Personal service occupations 5105
 
2.6%
Construction trades 4144
 
2.1%
Automobile mechanics and repairers 4025
 
2.1%
Teachers, except college and university 3683
 
1.9%
Supervisors and proprietors, sales occupations 3445
 
1.8%
Other administrative support occupations 3392
 
1.7%
Other values (37) 52979
27.0%

Length

2025-01-19T18:37:54.700385image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not 97467
14.8%
universe 97467
14.8%
in 97467
14.8%
occupations 47831
 
7.3%
and 46816
 
7.1%
other 21968
 
3.3%
service 18046
 
2.7%
operators 10243
 
1.6%
related 9882
 
1.5%
admin 9300
 
1.4%
Other values (83) 200943
30.6%

Most occurring characters

ValueCountFrequency (%)
e 502225
10.9%
461136
10.0%
i 419088
 
9.1%
n 410204
 
8.9%
o 330710
 
7.2%
t 318313
 
6.9%
r 316394
 
6.9%
s 309366
 
6.7%
a 258923
 
5.6%
u 201716
 
4.4%
Other values (30) 1062387
23.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3912476
85.2%
Space Separator 461136
 
10.0%
Uppercase Letter 196294
 
4.3%
Other Punctuation 20556
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 502225
12.8%
i 419088
10.7%
n 410204
10.5%
o 330710
8.5%
t 318313
8.1%
r 316394
8.1%
s 309366
7.9%
a 258923
 
6.6%
u 201716
 
5.2%
c 194338
 
5.0%
Other values (15) 651199
16.6%
Uppercase Letter
ValueCountFrequency (%)
N 97908
49.9%
O 22512
 
11.5%
F 17128
 
8.7%
C 12809
 
6.5%
P 12314
 
6.3%
M 7395
 
3.8%
S 5287
 
2.7%
H 4933
 
2.5%
E 4529
 
2.3%
T 4421
 
2.3%
Other values (3) 7058
 
3.6%
Space Separator
ValueCountFrequency (%)
461136
100.0%
Other Punctuation
ValueCountFrequency (%)
, 20556
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4108770
89.5%
Common 481692
 
10.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 502225
12.2%
i 419088
10.2%
n 410204
10.0%
o 330710
 
8.0%
t 318313
 
7.7%
r 316394
 
7.7%
s 309366
 
7.5%
a 258923
 
6.3%
u 201716
 
4.9%
c 194338
 
4.7%
Other values (28) 847493
20.6%
Common
ValueCountFrequency (%)
461136
95.7%
, 20556
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4590462
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 502225
10.9%
461136
10.0%
i 419088
 
9.1%
n 410204
 
8.9%
o 330710
 
7.2%
t 318313
 
6.9%
r 316394
 
6.9%
s 309366
 
6.7%
a 258923
 
5.6%
u 201716
 
4.4%
Other values (30) 1062387
23.1%

education
Categorical

High correlation 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
High School Graduate
48374 
Children
44347 
Some College
37530 
Below High School
36588 
College Graduate
19859 

Length

Max length20
Median length16
Mean length14.551112
Min length8

Characters and Unicode

Total characters2856296
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHigh School Graduate
2nd rowSome College
3rd rowBelow High School
4th rowChildren
5th rowChildren

Common Values

ValueCountFrequency (%)
High School Graduate 48374
24.6%
Children 44347
22.6%
Some College 37530
19.1%
Below High School 36588
18.6%
College Graduate 19859
10.1%
Advanced Degree 9596
 
4.9%

Length

2025-01-19T18:37:54.869660image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:55.059674image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
high 84962
19.6%
school 84962
19.6%
graduate 68233
15.8%
college 57389
13.2%
children 44347
10.2%
some 37530
8.7%
below 36588
8.4%
advanced 9596
 
2.2%
degree 9596
 
2.2%

Most occurring characters

ValueCountFrequency (%)
e 339860
11.9%
o 301431
 
10.6%
l 280675
 
9.8%
236909
 
8.3%
h 214271
 
7.5%
g 151947
 
5.3%
a 146062
 
5.1%
d 131772
 
4.6%
i 129309
 
4.5%
S 122492
 
4.3%
Other values (14) 801568
28.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2186184
76.5%
Uppercase Letter 433203
 
15.2%
Space Separator 236909
 
8.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 339860
15.5%
o 301431
13.8%
l 280675
12.8%
h 214271
9.8%
g 151947
7.0%
a 146062
6.7%
d 131772
 
6.0%
i 129309
 
5.9%
r 122176
 
5.6%
c 94558
 
4.3%
Other values (6) 274123
12.5%
Uppercase Letter
ValueCountFrequency (%)
S 122492
28.3%
C 101736
23.5%
H 84962
19.6%
G 68233
15.8%
B 36588
 
8.4%
A 9596
 
2.2%
D 9596
 
2.2%
Space Separator
ValueCountFrequency (%)
236909
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2619387
91.7%
Common 236909
 
8.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 339860
13.0%
o 301431
11.5%
l 280675
10.7%
h 214271
 
8.2%
g 151947
 
5.8%
a 146062
 
5.6%
d 131772
 
5.0%
i 129309
 
4.9%
S 122492
 
4.7%
r 122176
 
4.7%
Other values (13) 679392
25.9%
Common
ValueCountFrequency (%)
236909
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2856296
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 339860
11.9%
o 301431
 
10.6%
l 280675
 
9.8%
236909
 
8.3%
h 214271
 
7.5%
g 151947
 
5.3%
a 146062
 
5.1%
d 131772
 
4.6%
i 129309
 
4.5%
S 122492
 
4.3%
Other values (14) 801568
28.1%

wage_per_hour
Real number (ℝ)

Zeros 

Distinct1240
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean56.336505
Minimum0
Maximum9999
Zeros184991
Zeros (%)94.2%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2025-01-19T18:37:55.240168image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile500
Maximum9999
Range9999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation277.05433
Coefficient of variation (CV)4.9178473
Kurtosis152.75307
Mean56.336505
Median Absolute Deviation (MAD)0
Skewness8.8617868
Sum11058518
Variance76759.103
MonotonicityNot monotonic
2025-01-19T18:37:55.445138image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 184991
94.2%
500 734
 
0.4%
600 546
 
0.3%
700 534
 
0.3%
800 507
 
0.3%
1000 386
 
0.2%
425 375
 
0.2%
900 336
 
0.2%
550 280
 
0.1%
1200 256
 
0.1%
Other values (1230) 7349
 
3.7%
ValueCountFrequency (%)
0 184991
94.2%
20 1
 
< 0.1%
70 1
 
< 0.1%
75 2
 
< 0.1%
100 11
 
< 0.1%
110 1
 
< 0.1%
125 1
 
< 0.1%
135 1
 
< 0.1%
143 1
 
< 0.1%
150 6
 
< 0.1%
ValueCountFrequency (%)
9999 1
 
< 0.1%
9916 1
 
< 0.1%
9800 2
< 0.1%
9400 2
< 0.1%
9000 1
 
< 0.1%
8800 1
 
< 0.1%
8600 1
 
< 0.1%
8500 1
 
< 0.1%
8300 1
 
< 0.1%
8000 4
< 0.1%

enroll_in_edu_inst_last_wk
Categorical

Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
183762 
High school
 
6853
College or university
 
5679

Length

Max length22
Median length16
Mean length16.033939
Min length12

Characters and Unicode

Total characters3147366
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Not in universe
3rd row High school
4th row Not in universe
5th row Not in universe

Common Values

ValueCountFrequency (%)
Not in universe 183762
93.6%
High school 6853
 
3.5%
College or university 5679
 
2.9%

Length

2025-01-19T18:37:55.597834image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:55.727834image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 183762
31.6%
in 183762
31.6%
universe 183762
31.6%
high 6853
 
1.2%
school 6853
 
1.2%
college 5679
 
1.0%
or 5679
 
1.0%
university 5679
 
1.0%

Most occurring characters

ValueCountFrequency (%)
582029
18.5%
i 385735
12.3%
e 384561
12.2%
n 373203
11.9%
o 208826
 
6.6%
s 196294
 
6.2%
r 195120
 
6.2%
v 189441
 
6.0%
u 189441
 
6.0%
t 189441
 
6.0%
Other values (8) 253275
8.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2369043
75.3%
Space Separator 582029
 
18.5%
Uppercase Letter 196294
 
6.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 385735
16.3%
e 384561
16.2%
n 373203
15.8%
o 208826
8.8%
s 196294
8.3%
r 195120
8.2%
v 189441
8.0%
u 189441
8.0%
t 189441
8.0%
l 18211
 
0.8%
Other values (4) 38770
 
1.6%
Uppercase Letter
ValueCountFrequency (%)
N 183762
93.6%
H 6853
 
3.5%
C 5679
 
2.9%
Space Separator
ValueCountFrequency (%)
582029
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2565337
81.5%
Common 582029
 
18.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 385735
15.0%
e 384561
15.0%
n 373203
14.5%
o 208826
8.1%
s 196294
7.7%
r 195120
7.6%
v 189441
7.4%
u 189441
7.4%
t 189441
7.4%
N 183762
7.2%
Other values (7) 69513
 
2.7%
Common
ValueCountFrequency (%)
582029
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3147366
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
582029
18.5%
i 385735
12.3%
e 384561
12.2%
n 373203
11.9%
o 208826
 
6.6%
s 196294
 
6.2%
r 195120
 
6.2%
v 189441
 
6.0%
u 189441
 
6.0%
t 189441
 
6.0%
Other values (8) 253275
8.0%

marital_stat
Categorical

High correlation 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Married
84859 
Never Married
83296 
Divorced
12707 
Widowed
10456 
Separated
 
3459

Length

Max length21
Median length13
Mean length9.7542309
Min length7

Characters and Unicode

Total characters1914697
Distinct characters22
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWidowed
2nd rowDivorced
3rd rowNever Married
4th rowNever Married
5th rowNever Married

Common Values

ValueCountFrequency (%)
Married 84859
43.2%
Never Married 83296
42.4%
Divorced 12707
 
6.5%
Widowed 10456
 
5.3%
Separated 3459
 
1.8%
Married-spouse absent 1517
 
0.8%

Length

2025-01-19T18:37:55.882833image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:56.018833image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
married 168155
59.8%
never 83296
29.6%
divorced 12707
 
4.5%
widowed 10456
 
3.7%
separated 3459
 
1.2%
married-spouse 1517
 
0.5%
absent 1517
 
0.5%

Most occurring characters

ValueCountFrequency (%)
r 438806
22.9%
e 369379
19.3%
d 206750
10.8%
i 192835
10.1%
a 178107
9.3%
M 169672
 
8.9%
v 96003
 
5.0%
84813
 
4.4%
N 83296
 
4.4%
o 24680
 
1.3%
Other values (12) 70356
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1548777
80.9%
Uppercase Letter 279590
 
14.6%
Space Separator 84813
 
4.4%
Dash Punctuation 1517
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 438806
28.3%
e 369379
23.8%
d 206750
13.3%
i 192835
12.5%
a 178107
11.5%
v 96003
 
6.2%
o 24680
 
1.6%
c 12707
 
0.8%
w 10456
 
0.7%
p 4976
 
0.3%
Other values (5) 14078
 
0.9%
Uppercase Letter
ValueCountFrequency (%)
M 169672
60.7%
N 83296
29.8%
D 12707
 
4.5%
W 10456
 
3.7%
S 3459
 
1.2%
Space Separator
ValueCountFrequency (%)
84813
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1517
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1828367
95.5%
Common 86330
 
4.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 438806
24.0%
e 369379
20.2%
d 206750
11.3%
i 192835
10.5%
a 178107
9.7%
M 169672
 
9.3%
v 96003
 
5.3%
N 83296
 
4.6%
o 24680
 
1.3%
c 12707
 
0.7%
Other values (10) 56132
 
3.1%
Common
ValueCountFrequency (%)
84813
98.2%
- 1517
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1914697
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 438806
22.9%
e 369379
19.3%
d 206750
10.8%
i 192835
10.1%
a 178107
9.3%
M 169672
 
8.9%
v 96003
 
5.0%
84813
 
4.4%
N 83296
 
4.4%
o 24680
 
1.3%
Other values (12) 70356
 
3.7%

major_industry_code
Categorical

High correlation 

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe or children
97467 
Retail trade
17069 
Manufacturing-durable goods
 
9014
Education
 
8283
Manufacturing-nondurable goods
 
6895
Other values (19)
57566 

Length

Max length36
Median length28
Mean length24.337417
Min length7

Characters and Unicode

Total characters4777289
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe or children
2nd row Construction
3rd row Not in universe or children
4th row Not in universe or children
5th row Not in universe or children

Common Values

ValueCountFrequency (%)
Not in universe or children 97467
49.7%
Retail trade 17069
 
8.7%
Manufacturing-durable goods 9014
 
4.6%
Education 8283
 
4.2%
Manufacturing-nondurable goods 6895
 
3.5%
Finance insurance and real estate 6145
 
3.1%
Construction 5984
 
3.0%
Business and repair services 5651
 
2.9%
Medical except hospital 4683
 
2.4%
Public administration 4610
 
2.3%
Other values (14) 30493
 
15.5%

Length

2025-01-19T18:37:56.185573image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not 97467
13.7%
universe 97467
13.7%
or 97467
13.7%
children 97467
13.7%
in 97467
13.7%
services 21704
 
3.1%
trade 20663
 
2.9%
retail 17069
 
2.4%
goods 15909
 
2.2%
and 13160
 
1.9%
Other values (34) 135458
19.0%

Most occurring characters

ValueCountFrequency (%)
711298
14.9%
e 483445
10.1%
i 445075
 
9.3%
n 436324
 
9.1%
r 434473
 
9.1%
o 298089
 
6.2%
t 238790
 
5.0%
s 230048
 
4.8%
a 190730
 
4.0%
c 185335
 
3.9%
Other values (28) 1123682
23.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3847878
80.5%
Space Separator 711298
 
14.9%
Uppercase Letter 202204
 
4.2%
Dash Punctuation 15909
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 483445
12.6%
i 445075
11.6%
n 436324
11.3%
r 434473
11.3%
o 298089
7.7%
t 238790
 
6.2%
s 230048
 
6.0%
a 190730
 
5.0%
c 185335
 
4.8%
u 184035
 
4.8%
Other values (11) 721534
18.8%
Uppercase Letter
ValueCountFrequency (%)
N 97467
48.2%
M 21155
 
10.5%
R 17069
 
8.4%
E 9933
 
4.9%
H 9838
 
4.9%
P 8492
 
4.2%
C 7165
 
3.5%
F 6367
 
3.1%
B 5651
 
2.8%
O 4482
 
2.2%
Other values (5) 14585
 
7.2%
Space Separator
ValueCountFrequency (%)
711298
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 15909
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4050082
84.8%
Common 727207
 
15.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 483445
11.9%
i 445075
11.0%
n 436324
10.8%
r 434473
10.7%
o 298089
 
7.4%
t 238790
 
5.9%
s 230048
 
5.7%
a 190730
 
4.7%
c 185335
 
4.6%
u 184035
 
4.5%
Other values (26) 923738
22.8%
Common
ValueCountFrequency (%)
711298
97.8%
- 15909
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4777289
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
711298
14.9%
e 483445
10.1%
i 445075
 
9.3%
n 436324
 
9.1%
r 434473
 
9.1%
o 298089
 
6.2%
t 238790
 
5.0%
s 230048
 
4.8%
a 190730
 
4.0%
c 185335
 
3.9%
Other values (28) 1123682
23.5%

major_occupation_code
Categorical

High correlation 

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
97467 
Adm support including clerical
14836 
Professional specialty
13940 
Executive admin and managerial
12495 
Other service
12097 
Other values (10)
45459 

Length

Max length38
Median length36
Mean length20.842002
Min length6

Characters and Unicode

Total characters4091160
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Precision production craft & repair
3rd row Not in universe
4th row Not in universe
5th row Not in universe

Common Values

ValueCountFrequency (%)
Not in universe 97467
49.7%
Adm support including clerical 14836
 
7.6%
Professional specialty 13940
 
7.1%
Executive admin and managerial 12495
 
6.4%
Other service 12097
 
6.2%
Sales 11781
 
6.0%
Precision production craft & repair 10517
 
5.4%
Machine operators assmblrs & inspctrs 6377
 
3.2%
Handlers equip cleaners etc 4126
 
2.1%
Transportation and material moving 4020
 
2.0%
Other values (5) 8638
 
4.4%

Length

2025-01-19T18:37:56.338457image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not 97467
15.9%
in 97467
15.9%
universe 97467
15.9%
and 22676
 
3.7%
support 17854
 
2.9%
16894
 
2.8%
clerical 14836
 
2.4%
adm 14836
 
2.4%
including 14836
 
2.4%
professional 13940
 
2.3%
Other values (33) 204739
33.4%

Most occurring characters

ValueCountFrequency (%)
617138
15.1%
i 408259
10.0%
e 403678
9.9%
n 352634
 
8.6%
r 296592
 
7.2%
s 257072
 
6.3%
t 214090
 
5.2%
o 205966
 
5.0%
a 201609
 
4.9%
u 158075
 
3.9%
Other values (24) 976047
23.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3260798
79.7%
Space Separator 617138
 
15.1%
Uppercase Letter 196330
 
4.8%
Other Punctuation 16894
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 408259
12.5%
e 403678
12.4%
n 352634
10.8%
r 296592
9.1%
s 257072
7.9%
t 214090
 
6.6%
o 205966
 
6.3%
a 201609
 
6.2%
u 158075
 
4.8%
c 145771
 
4.5%
Other values (12) 617052
18.9%
Uppercase Letter
ValueCountFrequency (%)
N 97467
49.6%
P 26898
 
13.7%
A 14872
 
7.6%
E 12495
 
6.4%
O 12097
 
6.2%
S 11781
 
6.0%
T 7038
 
3.6%
M 6377
 
3.2%
H 4126
 
2.1%
F 3179
 
1.6%
Space Separator
ValueCountFrequency (%)
617138
100.0%
Other Punctuation
ValueCountFrequency (%)
& 16894
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3457128
84.5%
Common 634032
 
15.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 408259
11.8%
e 403678
11.7%
n 352634
10.2%
r 296592
 
8.6%
s 257072
 
7.4%
t 214090
 
6.2%
o 205966
 
6.0%
a 201609
 
5.8%
u 158075
 
4.6%
c 145771
 
4.2%
Other values (22) 813382
23.5%
Common
ValueCountFrequency (%)
617138
97.3%
& 16894
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4091160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
617138
15.1%
i 408259
10.0%
e 403678
9.9%
n 352634
 
8.6%
r 296592
 
7.2%
s 257072
 
6.3%
t 214090
 
5.2%
o 205966
 
5.0%
a 201609
 
4.9%
u 158075
 
3.9%
Other values (24) 976047
23.9%

race
Categorical

Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
White
164380 
Black
20206 
Asian or Pacific Islander
 
5821
Other
 
3645
Amer Indian Aleut or Eskimo
 
2242

Length

Max length28
Median length6
Mean length6.8443661
Min length6

Characters and Unicode

Total characters1343508
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row White
2nd row White
3rd row Asian or Pacific Islander
4th row White
5th row White

Common Values

ValueCountFrequency (%)
White 164380
83.7%
Black 20206
 
10.3%
Asian or Pacific Islander 5821
 
3.0%
Other 3645
 
1.9%
Amer Indian Aleut or Eskimo 2242
 
1.1%

Length

2025-01-19T18:37:56.484278image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:56.614930image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
white 164380
73.8%
black 20206
 
9.1%
or 8063
 
3.6%
asian 5821
 
2.6%
pacific 5821
 
2.6%
islander 5821
 
2.6%
other 3645
 
1.6%
amer 2242
 
1.0%
indian 2242
 
1.0%
aleut 2242
 
1.0%

Most occurring characters

ValueCountFrequency (%)
222725
16.6%
i 186327
13.9%
e 178330
13.3%
t 170267
12.7%
h 168025
12.5%
W 164380
12.2%
a 39911
 
3.0%
c 31848
 
2.4%
l 28269
 
2.1%
k 22448
 
1.7%
Other values (14) 130978
9.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 906121
67.4%
Space Separator 222725
 
16.6%
Uppercase Letter 214662
 
16.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 186327
20.6%
e 178330
19.7%
t 170267
18.8%
h 168025
18.5%
a 39911
 
4.4%
c 31848
 
3.5%
l 28269
 
3.1%
k 22448
 
2.5%
r 19771
 
2.2%
n 16126
 
1.8%
Other values (6) 44799
 
4.9%
Uppercase Letter
ValueCountFrequency (%)
W 164380
76.6%
B 20206
 
9.4%
A 10305
 
4.8%
I 8063
 
3.8%
P 5821
 
2.7%
O 3645
 
1.7%
E 2242
 
1.0%
Space Separator
ValueCountFrequency (%)
222725
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1120783
83.4%
Common 222725
 
16.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 186327
16.6%
e 178330
15.9%
t 170267
15.2%
h 168025
15.0%
W 164380
14.7%
a 39911
 
3.6%
c 31848
 
2.8%
l 28269
 
2.5%
k 22448
 
2.0%
B 20206
 
1.8%
Other values (13) 110772
9.9%
Common
ValueCountFrequency (%)
222725
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1343508
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
222725
16.6%
i 186327
13.9%
e 178330
13.3%
t 170267
12.7%
h 168025
12.5%
W 164380
12.2%
a 39911
 
3.0%
c 31848
 
2.4%
l 28269
 
2.1%
k 22448
 
1.7%
Other values (14) 130978
9.7%

hispanic_origin
Categorical

High correlation  Imbalance 

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
All other
168803 
Mexican-American
 
8008
Mexican (Mexicano)
 
7210
Central or South American
 
3891
Puerto Rican
 
3306
Other values (5)
 
5076

Length

Max length26
Median length10
Mean length10.980417
Min length3

Characters and Unicode

Total characters2155390
Distinct characters31
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row All other
2nd row All other
3rd row All other
4th row All other
5th row All other

Common Values

ValueCountFrequency (%)
All other 168803
86.0%
Mexican-American 8008
 
4.1%
Mexican (Mexicano) 7210
 
3.7%
Central or South American 3891
 
2.0%
Puerto Rican 3306
 
1.7%
Other Spanish 2476
 
1.3%
Cuban 1122
 
0.6%
NA 870
 
0.4%
Do not know 305
 
0.2%
Chicano 303
 
0.2%

Length

2025-01-19T18:37:56.780636image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:56.961703image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
other 171279
43.9%
all 168803
43.2%
mexican-american 8008
 
2.1%
mexican 7210
 
1.8%
mexicano 7210
 
1.8%
central 3891
 
1.0%
or 3891
 
1.0%
south 3891
 
1.0%
american 3891
 
1.0%
rican 3306
 
0.8%
Other values (8) 8992
 
2.3%

Most occurring characters

ValueCountFrequency (%)
390372
18.1%
l 341497
15.8%
e 212803
9.9%
r 194266
9.0%
o 188319
8.7%
t 182672
8.5%
A 181572
8.4%
h 177949
8.3%
n 46035
 
2.1%
a 45425
 
2.1%
Other values (21) 194480
9.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1516644
70.4%
Space Separator 390372
 
18.1%
Uppercase Letter 225946
 
10.5%
Dash Punctuation 8008
 
0.4%
Open Punctuation 7210
 
0.3%
Close Punctuation 7210
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 341497
22.5%
e 212803
14.0%
r 194266
12.8%
o 188319
12.4%
t 182672
12.0%
h 177949
11.7%
n 46035
 
3.0%
a 45425
 
3.0%
i 40412
 
2.7%
c 37936
 
2.5%
Other values (8) 49330
 
3.3%
Uppercase Letter
ValueCountFrequency (%)
A 181572
80.4%
M 22428
 
9.9%
S 6367
 
2.8%
C 5316
 
2.4%
P 3306
 
1.5%
R 3306
 
1.5%
O 2476
 
1.1%
N 870
 
0.4%
D 305
 
0.1%
Space Separator
ValueCountFrequency (%)
390372
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8008
100.0%
Open Punctuation
ValueCountFrequency (%)
( 7210
100.0%
Close Punctuation
ValueCountFrequency (%)
) 7210
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1742590
80.8%
Common 412800
 
19.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 341497
19.6%
e 212803
12.2%
r 194266
11.1%
o 188319
10.8%
t 182672
10.5%
A 181572
10.4%
h 177949
10.2%
n 46035
 
2.6%
a 45425
 
2.6%
i 40412
 
2.3%
Other values (17) 131640
 
7.6%
Common
ValueCountFrequency (%)
390372
94.6%
- 8008
 
1.9%
( 7210
 
1.7%
) 7210
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2155390
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
390372
18.1%
l 341497
15.8%
e 212803
9.9%
r 194266
9.0%
o 188319
8.7%
t 182672
8.5%
A 181572
8.4%
h 177949
8.3%
n 46035
 
2.1%
a 45425
 
2.1%
Other values (21) 194480
9.0%

sex
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Female
102400 
Male
93894 

Length

Max length7
Median length7
Mean length6.043333
Min length5

Characters and Unicode

Total characters1186270
Distinct characters7
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Female
2nd row Male
3rd row Female
4th row Female
5th row Female

Common Values

ValueCountFrequency (%)
Female 102400
52.2%
Male 93894
47.8%

Length

2025-01-19T18:37:57.153667image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:57.282665image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
female 102400
52.2%
male 93894
47.8%

Most occurring characters

ValueCountFrequency (%)
e 298694
25.2%
196294
16.5%
a 196294
16.5%
l 196294
16.5%
F 102400
 
8.6%
m 102400
 
8.6%
M 93894
 
7.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 793682
66.9%
Space Separator 196294
 
16.5%
Uppercase Letter 196294
 
16.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 298694
37.6%
a 196294
24.7%
l 196294
24.7%
m 102400
 
12.9%
Uppercase Letter
ValueCountFrequency (%)
F 102400
52.2%
M 93894
47.8%
Space Separator
ValueCountFrequency (%)
196294
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 989976
83.5%
Common 196294
 
16.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 298694
30.2%
a 196294
19.8%
l 196294
19.8%
F 102400
 
10.3%
m 102400
 
10.3%
M 93894
 
9.5%
Common
ValueCountFrequency (%)
196294
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186270
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 298694
25.2%
196294
16.5%
a 196294
16.5%
l 196294
16.5%
F 102400
 
8.6%
m 102400
 
8.6%
M 93894
 
7.9%

member_of_a_labor_union
Categorical

Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
177232 
No
 
16032
Yes
 
3030

Length

Max length16
Median length16
Mean length14.753013
Min length3

Characters and Unicode

Total characters2895928
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Not in universe
3rd row Not in universe
4th row Not in universe
5th row Not in universe

Common Values

ValueCountFrequency (%)
Not in universe 177232
90.3%
No 16032
 
8.2%
Yes 3030
 
1.5%

Length

2025-01-19T18:37:57.416666image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:57.544362image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 177232
32.2%
in 177232
32.2%
universe 177232
32.2%
no 16032
 
2.9%
yes 3030
 
0.6%

Most occurring characters

ValueCountFrequency (%)
550758
19.0%
e 357494
12.3%
i 354464
12.2%
n 354464
12.2%
N 193264
 
6.7%
o 193264
 
6.7%
s 180262
 
6.2%
t 177232
 
6.1%
u 177232
 
6.1%
v 177232
 
6.1%
Other values (2) 180262
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2148876
74.2%
Space Separator 550758
 
19.0%
Uppercase Letter 196294
 
6.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 357494
16.6%
i 354464
16.5%
n 354464
16.5%
o 193264
9.0%
s 180262
8.4%
t 177232
8.2%
u 177232
8.2%
v 177232
8.2%
r 177232
8.2%
Uppercase Letter
ValueCountFrequency (%)
N 193264
98.5%
Y 3030
 
1.5%
Space Separator
ValueCountFrequency (%)
550758
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2345170
81.0%
Common 550758
 
19.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 357494
15.2%
i 354464
15.1%
n 354464
15.1%
N 193264
8.2%
o 193264
8.2%
s 180262
7.7%
t 177232
7.6%
u 177232
7.6%
v 177232
7.6%
r 177232
7.6%
Common
ValueCountFrequency (%)
550758
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2895928
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
550758
19.0%
e 357494
12.3%
i 354464
12.2%
n 354464
12.2%
N 193264
 
6.7%
o 193264
 
6.7%
s 180262
 
6.2%
t 177232
 
6.1%
u 177232
 
6.1%
v 177232
 
6.1%
Other values (2) 180262
 
6.2%

reason_for_unemployment
Categorical

Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
190226 
Job loser
 
3014
Re-entrant
 
2018
Job leaver
 
598
New entrant
 
438

Length

Max length15
Median length15
Mean length14.832313
Min length9

Characters and Unicode

Total characters2911494
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowNot in universe
3rd rowNot in universe
4th rowNot in universe
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not in universe 190226
96.9%
Job loser 3014
 
1.5%
Re-entrant 2018
 
1.0%
Job leaver 598
 
0.3%
New entrant 438
 
0.2%

Length

2025-01-19T18:37:57.704360image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:57.843334image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 190226
32.8%
in 190226
32.8%
universe 190226
32.8%
job 3612
 
0.6%
loser 3014
 
0.5%
re-entrant 2018
 
0.3%
leaver 598
 
0.1%
new 438
 
0.1%
entrant 438
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 389574
13.4%
n 385364
13.2%
384502
13.2%
i 380452
13.1%
o 196852
6.8%
r 196294
6.7%
t 195138
6.7%
s 193240
6.6%
v 190824
6.6%
N 190664
6.5%
Other values (8) 208590
7.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2328680
80.0%
Space Separator 384502
 
13.2%
Uppercase Letter 196294
 
6.7%
Dash Punctuation 2018
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 389574
16.7%
n 385364
16.5%
i 380452
16.3%
o 196852
8.5%
r 196294
8.4%
t 195138
8.4%
s 193240
8.3%
v 190824
8.2%
u 190226
8.2%
b 3612
 
0.2%
Other values (3) 7104
 
0.3%
Uppercase Letter
ValueCountFrequency (%)
N 190664
97.1%
J 3612
 
1.8%
R 2018
 
1.0%
Space Separator
ValueCountFrequency (%)
384502
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2018
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2524974
86.7%
Common 386520
 
13.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 389574
15.4%
n 385364
15.3%
i 380452
15.1%
o 196852
7.8%
r 196294
7.8%
t 195138
7.7%
s 193240
7.7%
v 190824
7.6%
N 190664
7.6%
u 190226
7.5%
Other values (6) 16346
 
0.6%
Common
ValueCountFrequency (%)
384502
99.5%
- 2018
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2911494
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 389574
13.4%
n 385364
13.2%
384502
13.2%
i 380452
13.1%
o 196852
6.8%
r 196294
6.7%
t 195138
6.7%
s 193240
6.6%
v 190824
6.6%
N 190664
6.5%
Other values (8) 208590
7.2%

full_or_part_time_employment_stat
Categorical

High correlation 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Children or Armed Forces
120632 
FTE
43038 
Not Employed
26726 
PTE
 
5898

Length

Max length24
Median length24
Mean length17.130875
Min length3

Characters and Unicode

Total characters3362688
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot Employed
2nd rowChildren or Armed Forces
3rd rowNot Employed
4th rowChildren or Armed Forces
5th rowChildren or Armed Forces

Common Values

ValueCountFrequency (%)
Children or Armed Forces 120632
61.5%
FTE 43038
 
21.9%
Not Employed 26726
 
13.6%
PTE 5898
 
3.0%

Length

2025-01-19T18:37:57.994336image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:58.123375image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
children 120632
20.6%
or 120632
20.6%
armed 120632
20.6%
forces 120632
20.6%
fte 43038
 
7.4%
not 26726
 
4.6%
employed 26726
 
4.6%
pte 5898
 
1.0%

Most occurring characters

ValueCountFrequency (%)
r 482528
14.3%
e 388622
11.6%
388622
11.6%
o 294716
 
8.8%
d 267990
 
8.0%
F 163670
 
4.9%
m 147358
 
4.4%
l 147358
 
4.4%
h 120632
 
3.6%
s 120632
 
3.6%
Other values (12) 840560
25.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2411910
71.7%
Uppercase Letter 562156
 
16.7%
Space Separator 388622
 
11.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 482528
20.0%
e 388622
16.1%
o 294716
12.2%
d 267990
11.1%
m 147358
 
6.1%
l 147358
 
6.1%
h 120632
 
5.0%
s 120632
 
5.0%
c 120632
 
5.0%
n 120632
 
5.0%
Other values (4) 200810
8.3%
Uppercase Letter
ValueCountFrequency (%)
F 163670
29.1%
C 120632
21.5%
A 120632
21.5%
E 75662
13.5%
T 48936
 
8.7%
N 26726
 
4.8%
P 5898
 
1.0%
Space Separator
ValueCountFrequency (%)
388622
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2974066
88.4%
Common 388622
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 482528
16.2%
e 388622
13.1%
o 294716
9.9%
d 267990
 
9.0%
F 163670
 
5.5%
m 147358
 
5.0%
l 147358
 
5.0%
h 120632
 
4.1%
s 120632
 
4.1%
c 120632
 
4.1%
Other values (11) 719928
24.2%
Common
ValueCountFrequency (%)
388622
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3362688
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 482528
14.3%
e 388622
11.6%
388622
11.6%
o 294716
 
8.8%
d 267990
 
8.0%
F 163670
 
4.9%
m 147358
 
4.4%
l 147358
 
4.4%
h 120632
 
3.6%
s 120632
 
3.6%
Other values (12) 840560
25.0%

capital_gains
Real number (ℝ)

Zeros 

Distinct132
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean441.87004
Minimum0
Maximum99999
Zeros188915
Zeros (%)96.2%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2025-01-19T18:37:58.281745image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4735.677
Coefficient of variation (CV)10.717353
Kurtosis386.64929
Mean441.87004
Median Absolute Deviation (MAD)0
Skewness18.835992
Sum86736437
Variance22426637
MonotonicityNot monotonic
2025-01-19T18:37:58.600774image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 188915
96.2%
15024 788
 
0.4%
7688 609
 
0.3%
7298 582
 
0.3%
99999 390
 
0.2%
3103 237
 
0.1%
5178 207
 
0.1%
5013 158
 
0.1%
4386 151
 
0.1%
3325 121
 
0.1%
Other values (122) 4136
 
2.1%
ValueCountFrequency (%)
0 188915
96.2%
114 11
 
< 0.1%
401 33
 
< 0.1%
594 88
 
< 0.1%
914 17
 
< 0.1%
991 59
 
< 0.1%
1055 69
 
< 0.1%
1086 81
 
< 0.1%
1090 2
 
< 0.1%
1111 4
 
< 0.1%
ValueCountFrequency (%)
99999 390
0.2%
41310 2
 
< 0.1%
34095 11
 
< 0.1%
27828 94
 
< 0.1%
25236 23
 
< 0.1%
25124 18
 
< 0.1%
22040 2
 
< 0.1%
20051 91
 
< 0.1%
18481 14
 
< 0.1%
15831 16
 
< 0.1%

capital_losses
Real number (ℝ)

Zeros 

Distinct113
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.927593
Minimum0
Maximum4608
Zeros192388
Zeros (%)98.0%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2025-01-19T18:37:58.753745image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4608
Range4608
Interquartile range (IQR)0

Descriptive statistics

Standard deviation274.08117
Coefficient of variation (CV)7.226432
Kurtosis60.558557
Mean37.927593
Median Absolute Deviation (MAD)0
Skewness7.567395
Sum7444959
Variance75120.49
MonotonicityNot monotonic
2025-01-19T18:37:58.934758image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 192388
98.0%
1902 407
 
0.2%
1977 381
 
0.2%
1887 364
 
0.2%
1602 193
 
0.1%
2415 122
 
0.1%
1485 95
 
< 0.1%
1848 88
 
< 0.1%
1876 87
 
< 0.1%
1672 85
 
< 0.1%
Other values (103) 2084
 
1.1%
ValueCountFrequency (%)
0 192388
98.0%
155 1
 
< 0.1%
213 10
 
< 0.1%
323 10
 
< 0.1%
419 29
 
< 0.1%
625 25
 
< 0.1%
653 7
 
< 0.1%
772 5
 
< 0.1%
810 5
 
< 0.1%
880 9
 
< 0.1%
ValueCountFrequency (%)
4608 4
 
< 0.1%
4356 30
< 0.1%
3900 2
 
< 0.1%
3770 5
 
< 0.1%
3683 4
 
< 0.1%
3500 10
 
< 0.1%
3175 8
 
< 0.1%
3004 11
 
< 0.1%
2824 27
< 0.1%
2788 7
 
< 0.1%

dividends_from_stocks
Real number (ℝ)

Skewed  Zeros 

Distinct1478
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean200.72239
Minimum0
Maximum99999
Zeros175156
Zeros (%)89.2%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2025-01-19T18:37:59.126265image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile400
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2000.1306
Coefficient of variation (CV)9.9646614
Kurtosis1073.3032
Mean200.72239
Median Absolute Deviation (MAD)0
Skewness27.567201
Sum39400600
Variance4000522.5
MonotonicityNot monotonic
2025-01-19T18:37:59.311334image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 175156
89.2%
100 1148
 
0.6%
500 1030
 
0.5%
1000 894
 
0.5%
200 866
 
0.4%
50 831
 
0.4%
2000 574
 
0.3%
250 555
 
0.3%
150 549
 
0.3%
300 523
 
0.3%
Other values (1468) 14168
 
7.2%
ValueCountFrequency (%)
0 175156
89.2%
1 472
 
0.2%
2 193
 
0.1%
3 129
 
0.1%
4 75
 
< 0.1%
5 179
 
0.1%
6 100
 
0.1%
7 93
 
< 0.1%
8 94
 
< 0.1%
9 56
 
< 0.1%
ValueCountFrequency (%)
99999 25
< 0.1%
95095 1
 
< 0.1%
75000 5
 
< 0.1%
70000 3
 
< 0.1%
66621 2
 
< 0.1%
60000 7
 
< 0.1%
57678 1
 
< 0.1%
55000 1
 
< 0.1%
54600 2
 
< 0.1%
54500 2
 
< 0.1%

tax_filer_stat
Categorical

High correlation 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Joint Filer
79557 
Non-Filer
71903 
Individual Filer
44834 

Length

Max length16
Median length11
Mean length11.409406
Min length9

Characters and Unicode

Total characters2239598
Distinct characters17
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNon-Filer
2nd rowIndividual Filer
3rd rowNon-Filer
4th rowNon-Filer
5th rowNon-Filer

Common Values

ValueCountFrequency (%)
Joint Filer 79557
40.5%
Non-Filer 71903
36.6%
Individual Filer 44834
22.8%

Length

2025-01-19T18:37:59.473346image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:59.605303image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
filer 124391
38.8%
joint 79557
24.8%
non-filer 71903
22.4%
individual 44834
 
14.0%

Most occurring characters

ValueCountFrequency (%)
i 365519
16.3%
l 241128
10.8%
e 196294
8.8%
r 196294
8.8%
n 196294
8.8%
F 196294
8.8%
o 151460
 
6.8%
124391
 
5.6%
d 89668
 
4.0%
J 79557
 
3.6%
Other values (7) 402699
18.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1650716
73.7%
Uppercase Letter 392588
 
17.5%
Space Separator 124391
 
5.6%
Dash Punctuation 71903
 
3.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 365519
22.1%
l 241128
14.6%
e 196294
11.9%
r 196294
11.9%
n 196294
11.9%
o 151460
9.2%
d 89668
 
5.4%
t 79557
 
4.8%
v 44834
 
2.7%
u 44834
 
2.7%
Uppercase Letter
ValueCountFrequency (%)
F 196294
50.0%
J 79557
20.3%
N 71903
 
18.3%
I 44834
 
11.4%
Space Separator
ValueCountFrequency (%)
124391
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 71903
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2043304
91.2%
Common 196294
 
8.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 365519
17.9%
l 241128
11.8%
e 196294
9.6%
r 196294
9.6%
n 196294
9.6%
F 196294
9.6%
o 151460
7.4%
d 89668
 
4.4%
J 79557
 
3.9%
t 79557
 
3.9%
Other values (5) 251239
12.3%
Common
ValueCountFrequency (%)
124391
63.4%
- 71903
36.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2239598
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 365519
16.3%
l 241128
10.8%
e 196294
8.8%
r 196294
8.8%
n 196294
8.8%
F 196294
8.8%
o 151460
 
6.8%
124391
 
5.6%
d 89668
 
4.0%
J 79557
 
3.6%
Other values (7) 402699
18.0%

region_of_previous_residence
Categorical

High correlation  Imbalance 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
180562 
South
 
4875
West
 
4068
Midwest
 
3559
Northeast
 
2700

Length

Max length16
Median length16
Mean length15.271807
Min length5

Characters and Unicode

Total characters2997764
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row South
3rd row Not in universe
4th row Not in universe
5th row Not in universe

Common Values

ValueCountFrequency (%)
Not in universe 180562
92.0%
South 4875
 
2.5%
West 4068
 
2.1%
Midwest 3559
 
1.8%
Northeast 2700
 
1.4%
Abroad 530
 
0.3%

Length

2025-01-19T18:37:59.753308image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:37:59.888304image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 180562
32.4%
in 180562
32.4%
universe 180562
32.4%
south 4875
 
0.9%
west 4068
 
0.7%
midwest 3559
 
0.6%
northeast 2700
 
0.5%
abroad 530
 
0.1%

Most occurring characters

ValueCountFrequency (%)
557418
18.6%
e 371451
12.4%
i 364683
12.2%
n 361124
12.0%
t 198464
 
6.6%
s 190889
 
6.4%
o 188667
 
6.3%
u 185437
 
6.2%
r 183792
 
6.1%
N 183262
 
6.1%
Other values (10) 212577
 
7.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2244052
74.9%
Space Separator 557418
 
18.6%
Uppercase Letter 196294
 
6.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 371451
16.6%
i 364683
16.3%
n 361124
16.1%
t 198464
8.8%
s 190889
8.5%
o 188667
8.4%
u 185437
8.3%
r 183792
8.2%
v 180562
8.0%
h 7575
 
0.3%
Other values (4) 11408
 
0.5%
Uppercase Letter
ValueCountFrequency (%)
N 183262
93.4%
S 4875
 
2.5%
W 4068
 
2.1%
M 3559
 
1.8%
A 530
 
0.3%
Space Separator
ValueCountFrequency (%)
557418
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2440346
81.4%
Common 557418
 
18.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 371451
15.2%
i 364683
14.9%
n 361124
14.8%
t 198464
8.1%
s 190889
7.8%
o 188667
7.7%
u 185437
7.6%
r 183792
7.5%
N 183262
7.5%
v 180562
7.4%
Other values (9) 32015
 
1.3%
Common
ValueCountFrequency (%)
557418
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2997764
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
557418
18.6%
e 371451
12.4%
i 364683
12.2%
n 361124
12.0%
t 198464
 
6.6%
s 190889
 
6.4%
o 188667
 
6.3%
u 185437
 
6.2%
r 183792
 
6.1%
N 183262
 
6.1%
Other values (10) 212577
 
7.1%
Distinct51
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
2025-01-19T18:38:00.086336image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length16
Mean length15.449194
Min length2

Characters and Unicode

Total characters3032584
Distinct characters46
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Arkansas
3rd row Not in universe
4th row Not in universe
5th row Not in universe
ValueCountFrequency (%)
not 180562
32.2%
universe 180562
32.2%
in 180562
32.2%
california 1710
 
0.3%
north 1307
 
0.2%
utah 1061
 
0.2%
new 974
 
0.2%
carolina 905
 
0.2%
florida 847
 
0.2%
707
 
0.1%
Other values (46) 11192
 
2.0%
2025-01-19T18:38:00.453353image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
560389
18.5%
i 373914
12.3%
n 370809
12.2%
e 366790
12.1%
o 192226
 
6.3%
r 188882
 
6.2%
s 186123
 
6.1%
t 186023
 
6.1%
N 183194
 
6.0%
u 181785
 
6.0%
Other values (36) 242449
8.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2273043
75.0%
Space Separator 560389
 
18.5%
Uppercase Letter 198445
 
6.5%
Other Punctuation 707
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 373914
16.4%
n 370809
16.3%
e 366790
16.1%
o 192226
8.5%
r 188882
8.3%
s 186123
8.2%
t 186023
8.2%
u 181785
8.0%
v 180935
8.0%
a 18992
 
0.8%
Other values (14) 26564
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
N 183194
92.3%
C 3084
 
1.6%
M 2531
 
1.3%
A 1623
 
0.8%
O 1069
 
0.5%
U 1061
 
0.5%
I 927
 
0.5%
F 847
 
0.4%
D 821
 
0.4%
W 577
 
0.3%
Other values (10) 2711
 
1.4%
Space Separator
ValueCountFrequency (%)
560389
100.0%
Other Punctuation
ValueCountFrequency (%)
? 707
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2471488
81.5%
Common 561096
 
18.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 373914
15.1%
n 370809
15.0%
e 366790
14.8%
o 192226
7.8%
r 188882
7.6%
s 186123
7.5%
t 186023
7.5%
N 183194
7.4%
u 181785
7.4%
v 180935
7.3%
Other values (34) 60807
 
2.5%
Common
ValueCountFrequency (%)
560389
99.9%
? 707
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3032584
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
560389
18.5%
i 373914
12.3%
n 370809
12.2%
e 366790
12.1%
o 192226
 
6.3%
r 188882
 
6.2%
s 186123
 
6.1%
t 186023
 
6.1%
N 183194
 
6.0%
u 181785
 
6.0%
Other values (36) 242449
8.0%

detailed_household_and_family_stat
Categorical

High correlation 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Primary Householder
117117 
Child
62490 
Extended Family
 
9646
Other
 
7041

Length

Max length19
Median length19
Mean length13.844376
Min length5

Characters and Unicode

Total characters2717568
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowExtended Family
2nd rowPrimary Householder
3rd rowChild
4th rowChild
5th rowChild

Common Values

ValueCountFrequency (%)
Primary Householder 117117
59.7%
Child 62490
31.8%
Extended Family 9646
 
4.9%
Other 7041
 
3.6%

Length

2025-01-19T18:38:00.605304image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:00.736308image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
primary 117117
36.3%
householder 117117
36.3%
child 62490
19.3%
extended 9646
 
3.0%
family 9646
 
3.0%
other 7041
 
2.2%

Most occurring characters

ValueCountFrequency (%)
r 358392
13.2%
e 260567
 
9.6%
o 234234
 
8.6%
d 198899
 
7.3%
i 189253
 
7.0%
l 189253
 
7.0%
h 186648
 
6.9%
m 126763
 
4.7%
a 126763
 
4.7%
y 126763
 
4.7%
Other values (12) 720033
26.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2267748
83.4%
Uppercase Letter 323057
 
11.9%
Space Separator 126763
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 358392
15.8%
e 260567
11.5%
o 234234
10.3%
d 198899
8.8%
i 189253
8.3%
l 189253
8.3%
h 186648
8.2%
m 126763
 
5.6%
a 126763
 
5.6%
y 126763
 
5.6%
Other values (5) 270213
11.9%
Uppercase Letter
ValueCountFrequency (%)
P 117117
36.3%
H 117117
36.3%
C 62490
19.3%
E 9646
 
3.0%
F 9646
 
3.0%
O 7041
 
2.2%
Space Separator
ValueCountFrequency (%)
126763
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2590805
95.3%
Common 126763
 
4.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 358392
13.8%
e 260567
10.1%
o 234234
 
9.0%
d 198899
 
7.7%
i 189253
 
7.3%
l 189253
 
7.3%
h 186648
 
7.2%
m 126763
 
4.9%
a 126763
 
4.9%
y 126763
 
4.9%
Other values (11) 593270
22.9%
Common
ValueCountFrequency (%)
126763
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2717568
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 358392
13.2%
e 260567
 
9.6%
o 234234
 
8.6%
d 198899
 
7.3%
i 189253
 
7.0%
l 189253
 
7.0%
h 186648
 
6.9%
m 126763
 
4.7%
a 126763
 
4.7%
y 126763
 
4.7%
Other values (12) 720033
26.5%

detailed_household_summary_in_household
Categorical

High correlation 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Householder
75461 
Child under 18 never married
47318 
Spouse of householder
41684 
Child 18 or older
14416 
Other relative of householder
9651 
Other values (3)
7764 

Length

Max length37
Median length30
Mean length20.147406
Min length12

Characters and Unicode

Total characters3954815
Distinct characters29
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Other relative of householder
2nd row Householder
3rd row Child 18 or older
4th row Child under 18 never married
5th row Child under 18 never married

Common Values

ValueCountFrequency (%)
Householder 75461
38.4%
Child under 18 never married 47318
24.1%
Spouse of householder 41684
21.2%
Child 18 or older 14416
 
7.3%
Other relative of householder 9651
 
4.9%
Nonrelative of householder 7585
 
3.9%
Group Quarters- Secondary individual 132
 
0.1%
Child under 18 ever married 47
 
< 0.1%

Length

2025-01-19T18:38:00.881898image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:01.031390image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
householder 134381
24.1%
child 61781
11.1%
18 61781
11.1%
of 58920
10.6%
under 47365
 
8.5%
married 47365
 
8.5%
never 47318
 
8.5%
spouse 41684
 
7.5%
older 14416
 
2.6%
or 14416
 
2.6%
Other values (8) 27462
 
4.9%

Most occurring characters

ValueCountFrequency (%)
e 558709
14.1%
556889
14.1%
o 406047
10.3%
r 380088
9.6%
d 305704
7.7%
h 264733
 
6.7%
l 227946
 
5.8%
u 223826
 
5.7%
s 176197
 
4.5%
i 126778
 
3.2%
Other values (19) 727898
18.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3077674
77.8%
Space Separator 556889
 
14.1%
Uppercase Letter 196558
 
5.0%
Decimal Number 123562
 
3.1%
Dash Punctuation 132
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 558709
18.2%
o 406047
13.2%
r 380088
12.3%
d 305704
9.9%
h 264733
8.6%
l 227946
7.4%
u 223826
7.3%
s 176197
 
5.7%
i 126778
 
4.1%
n 102532
 
3.3%
Other values (8) 305114
9.9%
Uppercase Letter
ValueCountFrequency (%)
H 75461
38.4%
C 61781
31.4%
S 41816
21.3%
O 9651
 
4.9%
N 7585
 
3.9%
G 132
 
0.1%
Q 132
 
0.1%
Decimal Number
ValueCountFrequency (%)
8 61781
50.0%
1 61781
50.0%
Space Separator
ValueCountFrequency (%)
556889
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 132
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3274232
82.8%
Common 680583
 
17.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 558709
17.1%
o 406047
12.4%
r 380088
11.6%
d 305704
9.3%
h 264733
8.1%
l 227946
7.0%
u 223826
6.8%
s 176197
 
5.4%
i 126778
 
3.9%
n 102532
 
3.1%
Other values (15) 501672
15.3%
Common
ValueCountFrequency (%)
556889
81.8%
8 61781
 
9.1%
1 61781
 
9.1%
- 132
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3954815
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 558709
14.1%
556889
14.1%
o 406047
10.3%
r 380088
9.6%
d 305704
7.7%
h 264733
 
6.7%
l 227946
 
5.8%
u 223826
 
5.7%
s 176197
 
4.5%
i 126778
 
3.2%
Other values (19) 727898
18.4%

instance_weight
Real number (ℝ)

Distinct99800
Distinct (%)50.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1743.2676
Minimum37.87
Maximum18656.3
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2025-01-19T18:38:01.259486image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum37.87
5-th percentile394.9995
Q11061.53
median1620.175
Q32194.06
95-th percentile3593.14
Maximum18656.3
Range18618.43
Interquartile range (IQR)1132.53

Descriptive statistics

Standard deviation996.94598
Coefficient of variation (CV)0.57188351
Kurtosis5.395758
Mean1743.2676
Median Absolute Deviation (MAD)564.265
Skewness1.4314984
Sum3.4219297 × 108
Variance993901.3
MonotonicityNot monotonic
2025-01-19T18:38:01.432713image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1601.4 32
 
< 0.1%
1191.21 32
 
< 0.1%
753.23 32
 
< 0.1%
707.9 31
 
< 0.1%
1787.34 31
 
< 0.1%
1317.51 31
 
< 0.1%
1070.15 30
 
< 0.1%
1033.83 28
 
< 0.1%
1002.02 28
 
< 0.1%
1839.19 28
 
< 0.1%
Other values (99790) 195991
99.8%
ValueCountFrequency (%)
37.87 1
 
< 0.1%
39.11 1
 
< 0.1%
40.67 2
 
< 0.1%
42.82 2
 
< 0.1%
43.26 3
< 0.1%
45.74 2
 
< 0.1%
47.83 6
< 0.1%
49.82 2
 
< 0.1%
52.43 1
 
< 0.1%
52.46 4
< 0.1%
ValueCountFrequency (%)
18656.3 1
< 0.1%
16349.2 1
< 0.1%
13911.5 1
< 0.1%
13145.1 1
< 0.1%
13114.2 1
< 0.1%
12960.2 1
< 0.1%
12399.9 1
< 0.1%
12184.5 1
< 0.1%
11958.4 1
< 0.1%
11863 1
< 0.1%

migration_code_change_in_msa
Categorical

High correlation 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
99864 
No movement
81128 
MSA movement
10572 
Non-MSA movement
 
2802
Mixed movement
 
1402

Length

Max length16
Median length15
Mean length13.187005
Min length11

Characters and Unicode

Total characters2588530
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowMSA movement
3rd rowNot in universe
4th rowNo movement
5th rowNo movement

Common Values

ValueCountFrequency (%)
Not in universe 99864
50.9%
No movement 81128
41.3%
MSA movement 10572
 
5.4%
Non-MSA movement 2802
 
1.4%
Mixed movement 1402
 
0.7%
International 526
 
0.3%

Length

2025-01-19T18:38:01.593708image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:01.730709image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 99864
20.3%
in 99864
20.3%
universe 99864
20.3%
movement 95904
19.5%
no 81128
16.5%
msa 10572
 
2.1%
non-msa 2802
 
0.6%
mixed 1402
 
0.3%
international 526
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 393464
15.2%
n 300012
11.6%
295632
11.4%
o 280224
10.8%
i 201656
7.8%
t 196820
7.6%
v 195768
7.6%
m 191808
7.4%
N 183794
7.1%
r 100390
 
3.9%
Other values (11) 248962
9.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2064252
79.7%
Space Separator 295632
 
11.4%
Uppercase Letter 225844
 
8.7%
Dash Punctuation 2802
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 393464
19.1%
n 300012
14.5%
o 280224
13.6%
i 201656
9.8%
t 196820
9.5%
v 195768
9.5%
m 191808
9.3%
r 100390
 
4.9%
s 99864
 
4.8%
u 99864
 
4.8%
Other values (4) 4382
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
N 183794
81.4%
M 14776
 
6.5%
S 13374
 
5.9%
A 13374
 
5.9%
I 526
 
0.2%
Space Separator
ValueCountFrequency (%)
295632
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2802
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2290096
88.5%
Common 298434
 
11.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 393464
17.2%
n 300012
13.1%
o 280224
12.2%
i 201656
8.8%
t 196820
8.6%
v 195768
8.5%
m 191808
8.4%
N 183794
8.0%
r 100390
 
4.4%
s 99864
 
4.4%
Other values (9) 146296
 
6.4%
Common
ValueCountFrequency (%)
295632
99.1%
- 2802
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2588530
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 393464
15.2%
n 300012
11.6%
295632
11.4%
o 280224
10.8%
i 201656
7.8%
t 196820
7.6%
v 195768
7.6%
m 191808
7.4%
N 183794
7.1%
r 100390
 
3.9%
Other values (11) 248962
9.6%

migration_code_change_in_reg
Categorical

High correlation 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
99434 
Same area
90907 
Different area
 
5953

Length

Max length15
Median length15
Mean length12.190974
Min length9

Characters and Unicode

Total characters2393015
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowSame area
3rd rowNot in universe
4th rowSame area
5th rowSame area

Common Values

ValueCountFrequency (%)
Not in universe 99434
50.7%
Same area 90907
46.3%
Different area 5953
 
3.0%

Length

2025-01-19T18:38:01.911709image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:02.042683image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 99434
20.2%
in 99434
20.2%
universe 99434
20.2%
area 96860
19.7%
same 90907
18.5%
different 5953
 
1.2%

Most occurring characters

ValueCountFrequency (%)
e 398541
16.7%
295728
12.4%
a 284627
11.9%
i 204821
8.6%
n 204821
8.6%
r 202247
8.5%
t 105387
 
4.4%
N 99434
 
4.2%
o 99434
 
4.2%
u 99434
 
4.2%
Other values (6) 398541
16.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1900993
79.4%
Space Separator 295728
 
12.4%
Uppercase Letter 196294
 
8.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 398541
21.0%
a 284627
15.0%
i 204821
10.8%
n 204821
10.8%
r 202247
10.6%
t 105387
 
5.5%
o 99434
 
5.2%
u 99434
 
5.2%
v 99434
 
5.2%
s 99434
 
5.2%
Other values (2) 102813
 
5.4%
Uppercase Letter
ValueCountFrequency (%)
N 99434
50.7%
S 90907
46.3%
D 5953
 
3.0%
Space Separator
ValueCountFrequency (%)
295728
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2097287
87.6%
Common 295728
 
12.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 398541
19.0%
a 284627
13.6%
i 204821
9.8%
n 204821
9.8%
r 202247
9.6%
t 105387
 
5.0%
N 99434
 
4.7%
o 99434
 
4.7%
u 99434
 
4.7%
v 99434
 
4.7%
Other values (5) 299107
14.3%
Common
ValueCountFrequency (%)
295728
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2393015
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 398541
16.7%
295728
12.4%
a 284627
11.9%
i 204821
8.6%
n 204821
8.6%
r 202247
8.5%
t 105387
 
4.4%
N 99434
 
4.2%
o 99434
 
4.2%
u 99434
 
4.2%
Other values (6) 398541
16.7%

migration_code_move_within_reg
Categorical

High correlation  Imbalance 

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
?
98015 
Nonmover
81128 
Same county
 
9779
Different county same state
 
2792
Not in universe
 
1419
Other values (5)
 
3161

Length

Max length29
Median length28
Mean length6.1949881
Min length2

Characters and Unicode

Total characters1216039
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row ?
2nd row Same county
3rd row ?
4th row Nonmover
5th row Nonmover

Common Values

ValueCountFrequency (%)
? 98015
49.9%
Nonmover 81128
41.3%
Same county 9779
 
5.0%
Different county same state 2792
 
1.4%
Not in universe 1419
 
0.7%
Different state in South 972
 
0.5%
Different state in West 678
 
0.3%
Different state in Midwest 551
 
0.3%
Abroad 530
 
0.3%
Different state in Northeast 430
 
0.2%

Length

2025-01-19T18:38:02.208710image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:02.376680image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
98015
43.5%
nonmover 81128
36.0%
same 12571
 
5.6%
county 12571
 
5.6%
different 5423
 
2.4%
state 5423
 
2.4%
in 4050
 
1.8%
not 1419
 
0.6%
universe 1419
 
0.6%
south 972
 
0.4%
Other values (4) 2189
 
1.0%

Most occurring characters

ValueCountFrequency (%)
225180
18.5%
o 178178
14.7%
e 114465
9.4%
n 104591
8.6%
? 98015
8.1%
m 93699
7.7%
r 88930
 
7.3%
N 82977
 
6.8%
v 82547
 
6.8%
t 33320
 
2.7%
Other values (16) 114137
9.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 791934
65.1%
Space Separator 225180
 
18.5%
Uppercase Letter 100910
 
8.3%
Other Punctuation 98015
 
8.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 178178
22.5%
e 114465
14.5%
n 104591
13.2%
m 93699
11.8%
r 88930
11.2%
v 82547
10.4%
t 33320
 
4.2%
a 18954
 
2.4%
u 14962
 
1.9%
c 12571
 
1.6%
Other values (8) 49717
 
6.3%
Uppercase Letter
ValueCountFrequency (%)
N 82977
82.2%
S 10751
 
10.7%
D 5423
 
5.4%
W 678
 
0.7%
M 551
 
0.5%
A 530
 
0.5%
Space Separator
ValueCountFrequency (%)
225180
100.0%
Other Punctuation
ValueCountFrequency (%)
? 98015
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 892844
73.4%
Common 323195
 
26.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 178178
20.0%
e 114465
12.8%
n 104591
11.7%
m 93699
10.5%
r 88930
10.0%
N 82977
9.3%
v 82547
9.2%
t 33320
 
3.7%
a 18954
 
2.1%
u 14962
 
1.7%
Other values (14) 80221
9.0%
Common
ValueCountFrequency (%)
225180
69.7%
? 98015
30.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1216039
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
225180
18.5%
o 178178
14.7%
e 114465
9.4%
n 104591
8.6%
? 98015
8.1%
m 93699
7.7%
r 88930
 
7.3%
N 82977
 
6.8%
v 82547
 
6.8%
t 33320
 
2.7%
Other values (16) 114137
9.4%

live_in_this_house_1_year_ago
Categorical

High correlation 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
99434 
Yes
81128 
No
15732 

Length

Max length15
Median length15
Mean length9.4919763
Min length3

Characters and Unicode

Total characters1863218
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd row No
3rd rowNot in universe
4th row Yes
5th row Yes

Common Values

ValueCountFrequency (%)
Not in universe 99434
50.7%
Yes 81128
41.3%
No 15732
 
8.0%

Length

2025-01-19T18:38:02.565710image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:02.689708image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 99434
25.2%
in 99434
25.2%
universe 99434
25.2%
yes 81128
20.5%
no 15732
 
4.0%

Most occurring characters

ValueCountFrequency (%)
295728
15.9%
e 279996
15.0%
i 198868
10.7%
n 198868
10.7%
s 180562
9.7%
N 115166
 
6.2%
o 115166
 
6.2%
t 99434
 
5.3%
u 99434
 
5.3%
v 99434
 
5.3%
Other values (2) 180562
9.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1371196
73.6%
Space Separator 295728
 
15.9%
Uppercase Letter 196294
 
10.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 279996
20.4%
i 198868
14.5%
n 198868
14.5%
s 180562
13.2%
o 115166
8.4%
t 99434
 
7.3%
u 99434
 
7.3%
v 99434
 
7.3%
r 99434
 
7.3%
Uppercase Letter
ValueCountFrequency (%)
N 115166
58.7%
Y 81128
41.3%
Space Separator
ValueCountFrequency (%)
295728
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1567490
84.1%
Common 295728
 
15.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 279996
17.9%
i 198868
12.7%
n 198868
12.7%
s 180562
11.5%
N 115166
7.3%
o 115166
7.3%
t 99434
 
6.3%
u 99434
 
6.3%
v 99434
 
6.3%
r 99434
 
6.3%
Common
ValueCountFrequency (%)
295728
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1863218
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
295728
15.9%
e 279996
15.0%
i 198868
10.7%
n 198868
10.7%
s 180562
9.7%
N 115166
 
6.2%
o 115166
 
6.2%
t 99434
 
5.3%
u 99434
 
5.3%
v 99434
 
5.3%
Other values (2) 180562
9.7%

migration_prev_res_in_sunbelt
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
180562 
No
 
9959
Yes
 
5773

Length

Max length15
Median length15
Mean length14.067669
Min length3

Characters and Unicode

Total characters2761399
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd row Yes
3rd rowNot in universe
4th rowNot in universe
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not in universe 180562
92.0%
No 9959
 
5.1%
Yes 5773
 
2.9%

Length

2025-01-19T18:38:02.831357image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:02.956070image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 180562
32.4%
in 180562
32.4%
universe 180562
32.4%
no 9959
 
1.8%
yes 5773
 
1.0%

Most occurring characters

ValueCountFrequency (%)
376856
13.6%
e 366897
13.3%
i 361124
13.1%
n 361124
13.1%
N 190521
6.9%
o 190521
6.9%
s 186335
6.7%
t 180562
6.5%
u 180562
6.5%
v 180562
6.5%
Other values (2) 186335
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2188249
79.2%
Space Separator 376856
 
13.6%
Uppercase Letter 196294
 
7.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 366897
16.8%
i 361124
16.5%
n 361124
16.5%
o 190521
8.7%
s 186335
8.5%
t 180562
8.3%
u 180562
8.3%
v 180562
8.3%
r 180562
8.3%
Uppercase Letter
ValueCountFrequency (%)
N 190521
97.1%
Y 5773
 
2.9%
Space Separator
ValueCountFrequency (%)
376856
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2384543
86.4%
Common 376856
 
13.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 366897
15.4%
i 361124
15.1%
n 361124
15.1%
N 190521
8.0%
o 190521
8.0%
s 186335
7.8%
t 180562
7.6%
u 180562
7.6%
v 180562
7.6%
r 180562
7.6%
Common
ValueCountFrequency (%)
376856
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2761399
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
376856
13.6%
e 366897
13.3%
i 361124
13.1%
n 361124
13.1%
N 190521
6.9%
o 190521
6.9%
s 186335
6.7%
t 180562
6.5%
u 180562
6.5%
v 180562
6.5%
Other values (2) 186335
6.7%

num_persons_worked_for_employer
Real number (ℝ)

High correlation  Zeros 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9881046
Minimum0
Maximum6
Zeros92770
Zeros (%)47.3%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2025-01-19T18:38:03.067060image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.3710177
Coefficient of variation (CV)1.1926021
Kurtosis-1.1180198
Mean1.9881046
Median Absolute Deviation (MAD)1
Skewness0.72692272
Sum390253
Variance5.6217248
MonotonicityNot monotonic
2025-01-19T18:38:03.182170image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 92770
47.3%
6 36507
 
18.6%
1 23103
 
11.8%
4 14377
 
7.3%
3 13424
 
6.8%
2 10079
 
5.1%
5 6034
 
3.1%
ValueCountFrequency (%)
0 92770
47.3%
1 23103
 
11.8%
2 10079
 
5.1%
3 13424
 
6.8%
4 14377
 
7.3%
5 6034
 
3.1%
6 36507
 
18.6%
ValueCountFrequency (%)
6 36507
 
18.6%
5 6034
 
3.1%
4 14377
 
7.3%
3 13424
 
6.8%
2 10079
 
5.1%
1 23103
 
11.8%
0 92770
47.3%

family_members_under_18
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
144161 
Both parents present
36107 
Mother only present
 
12517
Father only present
 
1871
Neither parent present
 
1638

Length

Max length23
Median length16
Mean length17.271323
Min length16

Characters and Unicode

Total characters3390257
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Not in universe
3rd row Not in universe
4th row Both parents present
5th row Both parents present

Common Values

ValueCountFrequency (%)
Not in universe 144161
73.4%
Both parents present 36107
 
18.4%
Mother only present 12517
 
6.4%
Father only present 1871
 
1.0%
Neither parent present 1638
 
0.8%

Length

2025-01-19T18:38:03.384959image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:03.546791image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 144161
24.5%
in 144161
24.5%
universe 144161
24.5%
present 52133
 
8.9%
both 36107
 
6.1%
parents 36107
 
6.1%
only 14388
 
2.4%
mother 12517
 
2.1%
father 1871
 
0.3%
neither 1638
 
0.3%

Most occurring characters

ValueCountFrequency (%)
588882
17.4%
e 447997
13.2%
n 392588
11.6%
i 289960
8.6%
t 286172
8.4%
r 250065
7.4%
s 232401
 
6.9%
o 207173
 
6.1%
N 145799
 
4.3%
u 144161
 
4.3%
Other values (9) 405059
11.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2605081
76.8%
Space Separator 588882
 
17.4%
Uppercase Letter 196294
 
5.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 447997
17.2%
n 392588
15.1%
i 289960
11.1%
t 286172
11.0%
r 250065
9.6%
s 232401
8.9%
o 207173
8.0%
u 144161
 
5.5%
v 144161
 
5.5%
p 89878
 
3.5%
Other values (4) 120525
 
4.6%
Uppercase Letter
ValueCountFrequency (%)
N 145799
74.3%
B 36107
 
18.4%
M 12517
 
6.4%
F 1871
 
1.0%
Space Separator
ValueCountFrequency (%)
588882
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2801375
82.6%
Common 588882
 
17.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 447997
16.0%
n 392588
14.0%
i 289960
10.4%
t 286172
10.2%
r 250065
8.9%
s 232401
8.3%
o 207173
7.4%
N 145799
 
5.2%
u 144161
 
5.1%
v 144161
 
5.1%
Other values (8) 260898
9.3%
Common
ValueCountFrequency (%)
588882
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3390257
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
588882
17.4%
e 447997
13.2%
n 392588
11.6%
i 289960
8.6%
t 286172
8.4%
r 250065
7.4%
s 232401
 
6.9%
o 207173
 
6.1%
N 145799
 
4.3%
u 144161
 
4.3%
Other values (9) 405059
11.9%

country_of_birth_father
Categorical

High correlation  Imbalance 

Distinct43
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
United-States
156037 
Mexico
 
9948
?
 
6703
Puerto-Rico
 
2676
Italy
 
2212
Other values (38)
18718 

Length

Max length29
Median length14
Mean length12.650193
Min length2

Characters and Unicode

Total characters2483157
Distinct characters47
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row United-States
2nd row United-States
3rd row Vietnam
4th row United-States
5th row United-States

Common Values

ValueCountFrequency (%)
United-States 156037
79.5%
Mexico 9948
 
5.1%
? 6703
 
3.4%
Puerto-Rico 2676
 
1.4%
Italy 2212
 
1.1%
Canada 1380
 
0.7%
Germany 1356
 
0.7%
Dominican-Republic 1284
 
0.7%
Poland 1210
 
0.6%
Philippines 1152
 
0.6%
Other values (33) 12336
 
6.3%

Length

2025-01-19T18:38:03.711806image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united-states 156037
79.0%
mexico 9948
 
5.0%
6703
 
3.4%
puerto-rico 2676
 
1.4%
italy 2212
 
1.1%
canada 1380
 
0.7%
germany 1356
 
0.7%
dominican-republic 1284
 
0.6%
poland 1210
 
0.6%
philippines 1152
 
0.6%
Other values (39) 13608
 
6.9%

Most occurring characters

ValueCountFrequency (%)
t 475783
19.2%
e 332247
13.4%
197566
8.0%
a 182642
 
7.4%
i 180939
 
7.3%
n 170161
 
6.9%
d 162934
 
6.6%
- 161189
 
6.5%
S 158114
 
6.4%
s 157805
 
6.4%
Other values (37) 303777
12.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1764787
71.1%
Uppercase Letter 352482
 
14.2%
Space Separator 197566
 
8.0%
Dash Punctuation 161189
 
6.5%
Other Punctuation 6815
 
0.3%
Open Punctuation 159
 
< 0.1%
Close Punctuation 159
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 475783
27.0%
e 332247
18.8%
a 182642
 
10.3%
i 180939
 
10.3%
n 170161
 
9.6%
d 162934
 
9.2%
s 157805
 
8.9%
o 22709
 
1.3%
c 17286
 
1.0%
l 11397
 
0.6%
Other values (11) 50884
 
2.9%
Uppercase Letter
ValueCountFrequency (%)
S 158114
44.9%
U 156355
44.4%
M 9948
 
2.8%
P 5785
 
1.6%
C 4164
 
1.2%
R 3960
 
1.1%
I 3691
 
1.0%
G 2302
 
0.7%
E 2151
 
0.6%
D 1284
 
0.4%
Other values (10) 4728
 
1.3%
Other Punctuation
ValueCountFrequency (%)
? 6703
98.4%
& 112
 
1.6%
Space Separator
ValueCountFrequency (%)
197566
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 161189
100.0%
Open Punctuation
ValueCountFrequency (%)
( 159
100.0%
Close Punctuation
ValueCountFrequency (%)
) 159
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2117269
85.3%
Common 365888
 
14.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 475783
22.5%
e 332247
15.7%
a 182642
 
8.6%
i 180939
 
8.5%
n 170161
 
8.0%
d 162934
 
7.7%
S 158114
 
7.5%
s 157805
 
7.5%
U 156355
 
7.4%
o 22709
 
1.1%
Other values (31) 117580
 
5.6%
Common
ValueCountFrequency (%)
197566
54.0%
- 161189
44.1%
? 6703
 
1.8%
( 159
 
< 0.1%
) 159
 
< 0.1%
& 112
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2483157
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 475783
19.2%
e 332247
13.4%
197566
8.0%
a 182642
 
7.4%
i 180939
 
7.3%
n 170161
 
6.9%
d 162934
 
6.6%
- 161189
 
6.5%
S 158114
 
6.4%
s 157805
 
6.4%
Other values (37) 303777
12.2%

country_of_birth_mother
Categorical

High correlation  Imbalance 

Distinct43
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
United-States
157355 
Mexico
 
9721
?
 
6107
Puerto-Rico
 
2468
Italy
 
1844
Other values (38)
18799 

Length

Max length29
Median length14
Mean length12.703582
Min length2

Characters and Unicode

Total characters2493637
Distinct characters47
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row United-States
2nd row United-States
3rd row Vietnam
4th row United-States
5th row United-States

Common Values

ValueCountFrequency (%)
United-States 157355
80.2%
Mexico 9721
 
5.0%
? 6107
 
3.1%
Puerto-Rico 2468
 
1.3%
Italy 1844
 
0.9%
Canada 1451
 
0.7%
Germany 1382
 
0.7%
Philippines 1228
 
0.6%
Poland 1109
 
0.6%
El-Salvador 1107
 
0.6%
Other values (33) 12522
 
6.4%

Length

2025-01-19T18:38:03.865819image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united-states 157355
79.6%
mexico 9721
 
4.9%
6107
 
3.1%
puerto-rico 2468
 
1.2%
italy 1844
 
0.9%
canada 1451
 
0.7%
germany 1382
 
0.7%
philippines 1228
 
0.6%
poland 1109
 
0.6%
el-salvador 1107
 
0.6%
Other values (39) 13864
 
7.0%

Most occurring characters

ValueCountFrequency (%)
t 479197
19.2%
e 334332
13.4%
197636
7.9%
a 183901
 
7.4%
i 181335
 
7.3%
n 171510
 
6.9%
d 164509
 
6.6%
- 162233
 
6.5%
S 159624
 
6.4%
s 159182
 
6.4%
Other values (37) 300178
12.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1773075
71.1%
Uppercase Letter 354174
 
14.2%
Space Separator 197636
 
7.9%
Dash Punctuation 162233
 
6.5%
Other Punctuation 6205
 
0.2%
Open Punctuation 157
 
< 0.1%
Close Punctuation 157
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 479197
27.0%
e 334332
18.9%
a 183901
 
10.4%
i 181335
 
10.2%
n 171510
 
9.7%
d 164509
 
9.3%
s 159182
 
9.0%
o 21918
 
1.2%
c 16382
 
0.9%
l 11183
 
0.6%
Other values (11) 49626
 
2.8%
Uppercase Letter
ValueCountFrequency (%)
S 159624
45.1%
U 157669
44.5%
M 9721
 
2.7%
P 5533
 
1.6%
C 4082
 
1.2%
R 3565
 
1.0%
I 3378
 
1.0%
E 2383
 
0.7%
G 2242
 
0.6%
D 1097
 
0.3%
Other values (10) 4880
 
1.4%
Other Punctuation
ValueCountFrequency (%)
? 6107
98.4%
& 98
 
1.6%
Space Separator
ValueCountFrequency (%)
197636
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 162233
100.0%
Open Punctuation
ValueCountFrequency (%)
( 157
100.0%
Close Punctuation
ValueCountFrequency (%)
) 157
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2127249
85.3%
Common 366388
 
14.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 479197
22.5%
e 334332
15.7%
a 183901
 
8.6%
i 181335
 
8.5%
n 171510
 
8.1%
d 164509
 
7.7%
S 159624
 
7.5%
s 159182
 
7.5%
U 157669
 
7.4%
o 21918
 
1.0%
Other values (31) 114072
 
5.4%
Common
ValueCountFrequency (%)
197636
53.9%
- 162233
44.3%
? 6107
 
1.7%
( 157
 
< 0.1%
) 157
 
< 0.1%
& 98
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2493637
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 479197
19.2%
e 334332
13.4%
197636
7.9%
a 183901
 
7.4%
i 181335
 
7.3%
n 171510
 
6.9%
d 164509
 
6.6%
- 162233
 
6.5%
S 159624
 
6.4%
s 159182
 
6.4%
Other values (37) 300178
12.0%

country_of_birth_self
Categorical

High correlation  Imbalance 

Distinct43
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
United-States
173783 
Mexico
 
5759
?
 
3389
Puerto-Rico
 
1400
Germany
 
850
Other values (38)
 
11113

Length

Max length29
Median length14
Mean length13.268597
Min length2

Characters and Unicode

Total characters2604546
Distinct characters47
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row United-States
2nd row United-States
3rd row Vietnam
4th row United-States
5th row United-States

Common Values

ValueCountFrequency (%)
United-States 173783
88.5%
Mexico 5759
 
2.9%
? 3389
 
1.7%
Puerto-Rico 1400
 
0.7%
Germany 850
 
0.4%
Philippines 844
 
0.4%
Cuba 836
 
0.4%
Canada 700
 
0.4%
El-Salvador 689
 
0.4%
Dominican-Republic 687
 
0.3%
Other values (33) 7357
 
3.7%

Length

2025-01-19T18:38:04.021805image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united-states 173783
88.1%
mexico 5759
 
2.9%
3389
 
1.7%
puerto-rico 1400
 
0.7%
germany 850
 
0.4%
philippines 844
 
0.4%
cuba 836
 
0.4%
canada 700
 
0.4%
el-salvador 689
 
0.3%
dominican-republic 687
 
0.3%
Other values (39) 8404
 
4.3%

Most occurring characters

ValueCountFrequency (%)
t 525111
20.2%
e 359441
13.8%
197341
 
7.6%
a 189262
 
7.3%
i 188898
 
7.3%
n 181941
 
7.0%
d 177412
 
6.8%
- 176701
 
6.8%
S 175256
 
6.7%
s 174965
 
6.7%
Other values (37) 258218
9.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1855854
71.3%
Uppercase Letter 370957
 
14.2%
Space Separator 197341
 
7.6%
Dash Punctuation 176701
 
6.8%
Other Punctuation 3455
 
0.1%
Open Punctuation 119
 
< 0.1%
Close Punctuation 119
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 525111
28.3%
e 359441
19.4%
a 189262
 
10.2%
i 188898
 
10.2%
n 181941
 
9.8%
d 177412
 
9.6%
s 174965
 
9.4%
o 12963
 
0.7%
c 9791
 
0.5%
x 5759
 
0.3%
Other values (11) 30311
 
1.6%
Uppercase Letter
ValueCountFrequency (%)
S 175256
47.2%
U 174021
46.9%
M 5759
 
1.6%
P 3095
 
0.8%
C 2542
 
0.7%
R 2087
 
0.6%
G 1459
 
0.4%
E 1402
 
0.4%
I 1237
 
0.3%
D 687
 
0.2%
Other values (10) 3412
 
0.9%
Other Punctuation
ValueCountFrequency (%)
? 3389
98.1%
& 66
 
1.9%
Space Separator
ValueCountFrequency (%)
197341
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 176701
100.0%
Open Punctuation
ValueCountFrequency (%)
( 119
100.0%
Close Punctuation
ValueCountFrequency (%)
) 119
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2226811
85.5%
Common 377735
 
14.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 525111
23.6%
e 359441
16.1%
a 189262
 
8.5%
i 188898
 
8.5%
n 181941
 
8.2%
d 177412
 
8.0%
S 175256
 
7.9%
s 174965
 
7.9%
U 174021
 
7.8%
o 12963
 
0.6%
Other values (31) 67541
 
3.0%
Common
ValueCountFrequency (%)
197341
52.2%
- 176701
46.8%
? 3389
 
0.9%
( 119
 
< 0.1%
) 119
 
< 0.1%
& 66
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2604546
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 525111
20.2%
e 359441
13.8%
197341
 
7.6%
a 189262
 
7.3%
i 188898
 
7.3%
n 181941
 
7.0%
d 177412
 
6.8%
- 176701
 
6.8%
S 175256
 
6.7%
s 174965
 
6.7%
Other values (37) 258218
9.9%

citizenship
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Native
177058 
Foreign
 
13385
Naturalized
 
5851

Length

Max length11
Median length6
Mean length6.2172252
Min length6

Characters and Unicode

Total characters1220404
Distinct characters15
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNative
2nd rowNative
3rd rowForeign
4th rowNative
5th rowNative

Common Values

ValueCountFrequency (%)
Native 177058
90.2%
Foreign 13385
 
6.8%
Naturalized 5851
 
3.0%

Length

2025-01-19T18:38:04.194779image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:04.324371image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
native 177058
90.2%
foreign 13385
 
6.8%
naturalized 5851
 
3.0%

Most occurring characters

ValueCountFrequency (%)
i 196294
16.1%
e 196294
16.1%
a 188760
15.5%
N 182909
15.0%
t 182909
15.0%
v 177058
14.5%
r 19236
 
1.6%
F 13385
 
1.1%
o 13385
 
1.1%
g 13385
 
1.1%
Other values (5) 36789
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1024110
83.9%
Uppercase Letter 196294
 
16.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 196294
19.2%
e 196294
19.2%
a 188760
18.4%
t 182909
17.9%
v 177058
17.3%
r 19236
 
1.9%
o 13385
 
1.3%
g 13385
 
1.3%
n 13385
 
1.3%
u 5851
 
0.6%
Other values (3) 17553
 
1.7%
Uppercase Letter
ValueCountFrequency (%)
N 182909
93.2%
F 13385
 
6.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 1220404
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 196294
16.1%
e 196294
16.1%
a 188760
15.5%
N 182909
15.0%
t 182909
15.0%
v 177058
14.5%
r 19236
 
1.6%
F 13385
 
1.1%
o 13385
 
1.1%
g 13385
 
1.1%
Other values (5) 36789
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1220404
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 196294
16.1%
e 196294
16.1%
a 188760
15.5%
N 182909
15.0%
t 182909
15.0%
v 177058
14.5%
r 19236
 
1.6%
F 13385
 
1.1%
o 13385
 
1.1%
g 13385
 
1.1%
Other values (5) 36789
 
3.0%

own_business_or_self_employed
Categorical

Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
177445 
No
 
16151
Yes
 
2698

Length

Max length15
Median length15
Mean length13.765428
Min length2

Characters and Unicode

Total characters2702071
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowNot in universe
3rd rowNot in universe
4th rowNot in universe
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not in universe 177445
90.4%
No 16151
 
8.2%
Yes 2698
 
1.4%

Length

2025-01-19T18:38:04.472369image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:04.612381image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 177445
32.2%
in 177445
32.2%
universe 177445
32.2%
no 16151
 
2.9%
yes 2698
 
0.5%

Most occurring characters

ValueCountFrequency (%)
e 357588
13.2%
354890
13.1%
i 354890
13.1%
n 354890
13.1%
N 193596
7.2%
o 193596
7.2%
s 180143
6.7%
t 177445
6.6%
u 177445
6.6%
v 177445
6.6%
Other values (2) 180143
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2150887
79.6%
Space Separator 354890
 
13.1%
Uppercase Letter 196294
 
7.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 357588
16.6%
i 354890
16.5%
n 354890
16.5%
o 193596
9.0%
s 180143
8.4%
t 177445
8.2%
u 177445
8.2%
v 177445
8.2%
r 177445
8.2%
Uppercase Letter
ValueCountFrequency (%)
N 193596
98.6%
Y 2698
 
1.4%
Space Separator
ValueCountFrequency (%)
354890
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2347181
86.9%
Common 354890
 
13.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 357588
15.2%
i 354890
15.1%
n 354890
15.1%
N 193596
8.2%
o 193596
8.2%
s 180143
7.7%
t 177445
7.6%
u 177445
7.6%
v 177445
7.6%
r 177445
7.6%
Common
ValueCountFrequency (%)
354890
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2702071
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 357588
13.2%
354890
13.1%
i 354890
13.1%
n 354890
13.1%
N 193596
7.2%
o 193596
7.2%
s 180143
6.7%
t 177445
6.6%
u 177445
6.6%
v 177445
6.6%
Other values (2) 180143
6.7%

fill_inc_questionnaire_for_veteran's_admin
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not in universe
194310 
No
 
1593
Yes
 
391

Length

Max length16
Median length16
Mean length15.870597
Min length3

Characters and Unicode

Total characters3115303
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Not in universe
3rd row Not in universe
4th row Not in universe
5th row Not in universe

Common Values

ValueCountFrequency (%)
Not in universe 194310
99.0%
No 1593
 
0.8%
Yes 391
 
0.2%

Length

2025-01-19T18:38:04.762395image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:04.906923image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 194310
33.2%
in 194310
33.2%
universe 194310
33.2%
no 1593
 
0.3%
yes 391
 
0.1%

Most occurring characters

ValueCountFrequency (%)
584914
18.8%
e 389011
12.5%
i 388620
12.5%
n 388620
12.5%
N 195903
 
6.3%
o 195903
 
6.3%
s 194701
 
6.2%
t 194310
 
6.2%
u 194310
 
6.2%
v 194310
 
6.2%
Other values (2) 194701
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2334095
74.9%
Space Separator 584914
 
18.8%
Uppercase Letter 196294
 
6.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 389011
16.7%
i 388620
16.6%
n 388620
16.6%
o 195903
8.4%
s 194701
8.3%
t 194310
8.3%
u 194310
8.3%
v 194310
8.3%
r 194310
8.3%
Uppercase Letter
ValueCountFrequency (%)
N 195903
99.8%
Y 391
 
0.2%
Space Separator
ValueCountFrequency (%)
584914
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2530389
81.2%
Common 584914
 
18.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 389011
15.4%
i 388620
15.4%
n 388620
15.4%
N 195903
7.7%
o 195903
7.7%
s 194701
7.7%
t 194310
7.7%
u 194310
7.7%
v 194310
7.7%
r 194310
7.7%
Common
ValueCountFrequency (%)
584914
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3115303
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
584914
18.8%
e 389011
12.5%
i 388620
12.5%
n 388620
12.5%
N 195903
 
6.3%
o 195903
 
6.3%
s 194701
 
6.2%
t 194310
 
6.2%
u 194310
 
6.2%
v 194310
 
6.2%
Other values (2) 194701
 
6.2%

veterans_benefits
Categorical

High correlation 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Not a Veteran
149976 
Not in universe
44334 
Veteran
 
1984

Length

Max length15
Median length13
Mean length13.391066
Min length7

Characters and Unicode

Total characters2628586
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot a Veteran
2nd rowNot a Veteran
3rd rowNot a Veteran
4th rowNot in universe
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not a Veteran 149976
76.4%
Not in universe 44334
 
22.6%
Veteran 1984
 
1.0%

Length

2025-01-19T18:38:05.054924image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:05.186706image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 194310
33.2%
veteran 151960
26.0%
a 149976
25.6%
in 44334
 
7.6%
universe 44334
 
7.6%

Most occurring characters

ValueCountFrequency (%)
e 392588
14.9%
388620
14.8%
t 346270
13.2%
a 301936
11.5%
n 240628
9.2%
r 196294
7.5%
N 194310
7.4%
o 194310
7.4%
V 151960
 
5.8%
i 88668
 
3.4%
Other values (3) 133002
 
5.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1893696
72.0%
Space Separator 388620
 
14.8%
Uppercase Letter 346270
 
13.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 392588
20.7%
t 346270
18.3%
a 301936
15.9%
n 240628
12.7%
r 196294
10.4%
o 194310
10.3%
i 88668
 
4.7%
u 44334
 
2.3%
v 44334
 
2.3%
s 44334
 
2.3%
Uppercase Letter
ValueCountFrequency (%)
N 194310
56.1%
V 151960
43.9%
Space Separator
ValueCountFrequency (%)
388620
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2239966
85.2%
Common 388620
 
14.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 392588
17.5%
t 346270
15.5%
a 301936
13.5%
n 240628
10.7%
r 196294
8.8%
N 194310
8.7%
o 194310
8.7%
V 151960
 
6.8%
i 88668
 
4.0%
u 44334
 
2.0%
Other values (2) 88668
 
4.0%
Common
ValueCountFrequency (%)
388620
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2628586
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 392588
14.9%
388620
14.8%
t 346270
13.2%
a 301936
11.5%
n 240628
9.2%
r 196294
7.5%
N 194310
7.4%
o 194310
7.4%
V 151960
 
5.8%
i 88668
 
3.4%
Other values (3) 133002
 
5.1%

weeks_worked_in_year
Real number (ℝ)

High correlation  Zeros 

Distinct53
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.553889
Minimum0
Maximum52
Zeros92770
Zeros (%)47.3%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2025-01-19T18:38:05.338702image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median12
Q352
95-th percentile52
Maximum52
Range52
Interquartile range (IQR)52

Descriptive statistics

Standard deviation24.428588
Coefficient of variation (CV)1.0371361
Kurtosis-1.8743188
Mean23.553889
Median Absolute Deviation (MAD)12
Skewness0.18056487
Sum4623487
Variance596.75593
MonotonicityNot monotonic
2025-01-19T18:38:05.691098image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 92770
47.3%
52 70308
35.8%
40 2790
 
1.4%
50 2304
 
1.2%
26 2268
 
1.2%
48 1806
 
0.9%
12 1777
 
0.9%
30 1378
 
0.7%
20 1330
 
0.7%
8 1125
 
0.6%
Other values (43) 18438
 
9.4%
ValueCountFrequency (%)
0 92770
47.3%
1 464
 
0.2%
2 457
 
0.2%
3 417
 
0.2%
4 757
 
0.4%
5 309
 
0.2%
6 645
 
0.3%
7 152
 
0.1%
8 1125
 
0.6%
9 239
 
0.1%
ValueCountFrequency (%)
52 70308
35.8%
51 819
 
0.4%
50 2304
 
1.2%
49 509
 
0.3%
48 1806
 
0.9%
47 278
 
0.1%
46 708
 
0.4%
45 669
 
0.3%
44 845
 
0.4%
43 374
 
0.2%

year
Categorical

High correlation 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
1994
98279 
1995
98015 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters785176
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1995
2nd row1994
3rd row1995
4th row1994
5th row1994

Common Values

ValueCountFrequency (%)
1994 98279
50.1%
1995 98015
49.9%

Length

2025-01-19T18:38:05.831128image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:05.942141image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
1994 98279
50.1%
1995 98015
49.9%

Most occurring characters

ValueCountFrequency (%)
9 392588
50.0%
1 196294
25.0%
4 98279
 
12.5%
5 98015
 
12.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 785176
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9 392588
50.0%
1 196294
25.0%
4 98279
 
12.5%
5 98015
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
Common 785176
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
9 392588
50.0%
1 196294
25.0%
4 98279
 
12.5%
5 98015
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 785176
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9 392588
50.0%
1 196294
25.0%
4 98279
 
12.5%
5 98015
 
12.5%

target
Categorical

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
1
183912 
0
 
12382

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters196294
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 183912
93.7%
0 12382
 
6.3%

Length

2025-01-19T18:38:06.096424image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:06.206424image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
1 183912
93.7%
0 12382
 
6.3%

Most occurring characters

ValueCountFrequency (%)
1 183912
93.7%
0 12382
 
6.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 196294
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 183912
93.7%
0 12382
 
6.3%

Most occurring scripts

ValueCountFrequency (%)
Common 196294
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 183912
93.7%
0 12382
 
6.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 196294
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 183912
93.7%
0 12382
 
6.3%

Interactions

2025-01-19T18:37:49.712942image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:40.990371image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:42.388866image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:43.563440image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:44.765720image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:45.995805image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:47.244826image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:48.490942image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:49.851860image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:41.153370image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:42.527892image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:43.722721image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:44.903723image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:46.132825image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:47.395798image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:48.632939image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:49.998021image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:41.324373image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:42.672861image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:43.867720image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:45.044720image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:46.278848image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:47.546583image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:48.783943image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:50.144980image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:41.543865image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:42.822862image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:44.008718image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:45.180721image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:46.425795image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:47.696578image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:48.926941image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:50.312094image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:41.720863image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:42.960017image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:44.152719image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:45.316718image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:46.567797image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:47.838582image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:49.087980image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:50.591587image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:41.902864image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:43.106056image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:44.294721image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:45.461721image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:46.703798image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:47.992007image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:49.249970image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:50.748356image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:42.082866image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:43.262021image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:44.453731image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:45.624719image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:46.884813image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:48.172942image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:49.415969image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:50.893314image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:42.234869image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:43.410442image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:44.611718image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:45.825418image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:47.081808image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:48.341943image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:37:49.563983image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-01-19T18:38:06.370438image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
agecapital_gainscapital_lossescitizenshipclass_of_workercountry_of_birth_fathercountry_of_birth_mothercountry_of_birth_selfdetailed_household_and_family_statdetailed_household_summary_in_householddetailed_industry_recodedetailed_occupation_recodedividends_from_stockseducationenroll_in_edu_inst_last_wkfamily_members_under_18fill_inc_questionnaire_for_veteran's_adminfull_or_part_time_employment_stathispanic_origininstance_weightlive_in_this_house_1_year_agomajor_industry_codemajor_occupation_codemarital_statmember_of_a_labor_unionmigration_code_change_in_msamigration_code_change_in_regmigration_code_move_within_regmigration_prev_res_in_sunbeltnum_persons_worked_for_employerown_business_or_self_employedracereason_for_unemploymentregion_of_previous_residencesextargettax_filer_statveterans_benefitswage_per_hourweeks_worked_in_yearyear
age1.0000.1270.0680.1210.3650.0920.0860.0620.5070.3970.2450.2490.2510.4440.4340.4810.0900.3150.0510.0040.1150.2440.2430.4250.1720.0730.0660.0860.1080.2260.1920.0560.0800.0690.0610.2420.5850.6440.0390.2690.009
capital_gains0.1271.000-0.0280.0120.0480.0090.0090.0040.0420.0510.0490.0970.1170.0750.0220.0290.0000.0300.0120.0030.0080.0470.0640.0320.0190.0140.0060.0140.0080.1130.0270.0110.0040.0140.0580.3180.0590.0370.0050.1300.006
capital_losses0.068-0.0281.0000.0070.0510.0100.0060.0000.0470.0540.0380.0470.0670.0540.0210.0390.0100.0340.0060.0090.0040.0370.0400.0450.0200.0020.0020.0020.0040.0970.0230.0100.0040.0000.0730.1720.0950.0540.0050.1050.003
citizenship0.1210.0120.0071.0000.0570.5360.5420.7200.1070.1110.0980.1140.0090.1370.0180.0860.0160.0460.3970.0440.0310.0830.0930.1020.0130.0920.0210.0920.0290.0460.0240.2400.0270.0920.0100.0380.0550.0840.0170.0350.012
class_of_worker0.3650.0480.0510.0571.0000.0550.0530.0520.2610.2760.6390.5990.0150.3180.0900.2750.0260.3630.0470.0180.0370.6550.5490.2120.2780.0290.0210.0520.0360.5100.2080.0470.4360.0270.1230.2350.4860.3890.0830.4480.003
country_of_birth_father0.0920.0090.0100.5360.0551.0000.7710.6590.0950.0670.0290.0360.0050.1140.0360.0690.0240.0620.5320.0440.0370.0340.0490.0860.0430.0530.0290.0400.0470.0430.0490.4400.0190.0620.0250.0710.0770.0800.0050.0300.030
country_of_birth_mother0.0860.0090.0060.5420.0530.7711.0000.6860.0910.0640.0290.0360.0000.1120.0360.0660.0240.0590.5420.0430.0360.0340.0480.0830.0420.0530.0290.0400.0480.0410.0460.4430.0180.0620.0240.0700.0720.0760.0070.0280.030
country_of_birth_self0.0620.0040.0000.7200.0520.6590.6861.0000.0880.0630.0350.0420.0000.1230.0290.0620.0150.0450.4890.0320.0330.0390.0550.0700.0280.0590.0280.0450.0420.0360.0250.3800.0240.0660.0280.0590.0580.0840.0070.0210.024
detailed_household_and_family_stat0.5070.0420.0470.1070.2610.0950.0910.0881.0000.9810.2660.2750.0270.4280.2050.5270.0480.1800.0810.0330.0730.2670.2680.4870.1050.0620.0470.0890.0690.2610.0950.0750.0400.0600.0570.1950.5380.5040.0520.2760.005
detailed_household_summary_in_household0.3970.0510.0540.1110.2760.0670.0640.0630.9811.0000.2200.2340.0180.3970.3360.5290.0670.2150.0550.0370.0730.2200.2230.4190.1280.0510.0480.0650.0680.2280.1360.0720.0630.0480.3720.2250.6620.6030.0360.2210.006
detailed_industry_recode0.2450.0490.0380.0980.6390.0290.0290.0350.2660.2201.0000.4250.0090.3210.1270.2760.0290.3600.0520.0180.0360.9150.5990.1950.2610.0310.0280.0360.0390.4040.2080.0570.1550.0300.3010.2800.4880.3870.0670.3050.007
detailed_occupation_recode0.2490.0970.0470.1140.5990.0360.0360.0420.2750.2340.4251.0000.0190.4040.1610.2760.0320.3600.0630.0170.0360.5611.0000.2030.2610.0290.0260.0360.0370.3950.2140.0710.1630.0270.3950.4370.4970.3870.0770.3100.006
dividends_from_stocks0.2510.1170.0670.0090.0150.0050.0000.0000.0270.0180.0090.0191.0000.0400.0090.0190.0040.0150.0070.0110.0090.0130.0140.0230.0050.0070.0060.0060.0070.1470.0120.0100.0000.0070.0110.1460.0370.025-0.0000.1520.000
education0.4440.0750.0540.1370.3180.1140.1120.1230.4280.3970.3210.4040.0401.0000.3270.4540.0430.2790.1020.0210.0240.3190.3730.2990.1480.0200.0290.0740.0230.2860.1560.0680.0640.0170.0680.3770.5450.7070.0510.2870.009
enroll_in_edu_inst_last_wk0.4340.0220.0210.0180.0900.0360.0360.0290.2050.3360.1270.1610.0090.3271.0000.1570.0140.0730.0220.0220.0190.1330.1180.1990.0250.0210.0140.0270.0200.0710.0700.0260.0870.0210.0140.0650.1730.1020.0230.1840.003
family_members_under_180.4810.0290.0390.0860.2750.0690.0660.0620.5270.5290.2760.2760.0190.4540.1571.0000.0420.2210.0630.0170.0270.2760.2760.3500.1280.0220.0160.0740.0210.2830.1250.0990.0450.0180.0380.1560.5310.6280.0430.2830.006
fill_inc_questionnaire_for_veteran's_admin0.0900.0000.0100.0160.0260.0240.0240.0150.0480.0670.0290.0320.0040.0430.0140.0421.0000.0380.0200.0060.0060.0280.0270.0630.0090.0060.0020.0090.0060.0230.0050.0130.0040.0050.0640.0270.0260.7070.0000.0210.000
full_or_part_time_employment_stat0.3150.0300.0340.0460.3630.0620.0590.0450.1800.2150.3600.3600.0150.2790.0730.2210.0381.0000.0330.0220.5530.3600.3590.1890.1480.4490.5530.4580.1650.3090.1310.0220.0770.1350.1040.1510.2770.3040.0570.3250.793
hispanic_origin0.0510.0120.0060.3970.0470.5320.5420.4890.0810.0550.0520.0630.0070.1020.0220.0630.0200.0331.0000.0510.0410.0450.0540.0570.0450.0420.0330.0310.0570.0360.0340.1530.0200.0520.0130.0700.0800.0720.0090.0260.042
instance_weight0.0040.0030.0090.0440.0180.0440.0430.0320.0330.0370.0180.0170.0110.0210.0220.0170.0060.0220.0511.0000.0320.0160.0150.0230.0160.0270.0280.0170.0370.0370.0160.0830.0160.0290.0360.0120.0430.0200.0180.0250.030
live_in_this_house_1_year_ago0.1150.0080.0040.0310.0370.0370.0360.0330.0730.0730.0360.0360.0090.0240.0190.0270.0060.5530.0410.0321.0000.0340.0290.0610.0100.9920.8181.0000.7070.0350.0490.0450.0330.7070.0060.0290.0460.0190.0080.0410.986
major_industry_code0.2440.0470.0370.0830.6550.0340.0340.0390.2670.2200.9150.5610.0130.3190.1330.2760.0280.3600.0450.0160.0341.0000.5880.1950.2590.0280.0260.0350.0360.4020.2080.0520.1540.0270.2930.2760.4880.3870.0660.3060.007
major_occupation_code0.2430.0640.0400.0930.5490.0490.0480.0550.2680.2230.5991.0000.0140.3730.1180.2760.0270.3590.0540.0150.0290.5881.0000.1960.2460.0240.0230.0340.0300.3780.2080.0570.1520.0220.3370.3650.4910.3870.0670.3040.004
marital_stat0.4250.0320.0450.1020.2120.0860.0830.0700.4870.4190.1950.2030.0230.2990.1990.3500.0630.1890.0570.0230.0610.1950.1961.0000.0950.0410.0310.0600.0550.1890.0740.0820.0380.0370.1630.1940.7190.4470.0400.1990.000
member_of_a_labor_union0.1720.0190.0200.0130.2780.0430.0420.0280.1050.1280.2610.2610.0050.1480.0250.1280.0090.1480.0450.0160.0100.2590.2460.0951.0000.0110.0090.0240.0090.2270.0680.0220.0410.0110.0300.0740.1660.1260.3510.2210.000
migration_code_change_in_msa0.0730.0140.0020.0920.0290.0530.0530.0590.0620.0510.0310.0290.0070.0200.0210.0220.0060.4490.0420.0270.9920.0280.0240.0410.0111.0000.8560.7930.7060.0250.0500.0490.0240.6290.0050.0310.0500.0200.0050.0300.981
migration_code_change_in_reg0.0660.0060.0020.0210.0210.0290.0290.0280.0470.0480.0280.0260.0060.0290.0140.0160.0020.5530.0330.0280.8180.0260.0230.0310.0090.8561.0001.0000.4470.0290.0410.0410.0330.4670.0050.0140.0250.0160.0080.0400.986
migration_code_move_within_reg0.0860.0140.0020.0920.0520.0400.0400.0450.0890.0650.0360.0360.0060.0740.0270.0740.0090.4580.0310.0171.0000.0350.0340.0600.0240.7931.0001.0000.7430.0430.0570.0440.0280.7090.0090.0380.0920.1120.0030.0381.000
migration_prev_res_in_sunbelt0.1080.0080.0040.0290.0360.0470.0480.0420.0690.0680.0390.0370.0070.0230.0200.0210.0060.1650.0570.0370.7070.0360.0300.0550.0090.7060.4470.7431.0000.0280.0460.0250.0330.8730.0050.0290.0430.0070.0080.0410.295
num_persons_worked_for_employer0.2260.1130.0970.0460.5100.0430.0410.0360.2610.2280.4040.3950.1470.2860.0710.2830.0230.3090.0360.0370.0350.4020.3780.1890.2270.0250.0290.0430.0281.0000.2200.0480.0570.0200.1040.2350.5200.4050.2270.8760.031
own_business_or_self_employed0.1920.0270.0230.0240.2080.0490.0460.0250.0950.1360.2080.2140.0120.1560.0700.1250.0050.1310.0340.0160.0490.2080.2080.0740.0680.0500.0410.0570.0460.2201.0000.0330.0420.0490.0470.0830.1840.1250.0200.2400.012
race0.0560.0110.0100.2400.0470.4400.4430.3800.0750.0720.0570.0710.0100.0680.0260.0990.0130.0220.1530.0830.0450.0520.0570.0820.0220.0490.0410.0440.0250.0480.0331.0000.0220.0450.0220.0600.1080.0560.0090.0400.050
reason_for_unemployment0.0800.0040.0040.0270.4360.0190.0180.0240.0400.0630.1550.1630.0000.0640.0870.0450.0040.0770.0200.0160.0330.1540.1520.0380.0410.0240.0330.0280.0330.0570.0420.0221.0000.0240.0470.0280.0770.0690.0110.1130.014
region_of_previous_residence0.0690.0140.0000.0920.0270.0620.0620.0660.0600.0480.0300.0270.0070.0170.0210.0180.0050.1350.0520.0290.7070.0270.0220.0370.0110.6290.4670.7090.8730.0200.0490.0450.0241.0000.0070.0300.0440.0080.0030.0280.295
sex0.0610.0580.0730.0100.1230.0250.0240.0280.0570.3720.3010.3950.0110.0680.0140.0380.0640.1040.0130.0360.0060.2930.3370.1630.0300.0050.0050.0090.0050.1040.0470.0220.0470.0071.0000.1590.0330.0720.0390.1170.000
target0.2420.3180.1720.0380.2350.0710.0700.0590.1950.2250.2800.4370.1460.3770.0650.1560.0270.1510.0700.0120.0290.2760.3650.1940.0740.0310.0140.0380.0290.2350.0830.0600.0280.0300.1591.0000.2170.1410.0720.2680.015
tax_filer_stat0.5850.0590.0950.0550.4860.0770.0720.0580.5380.6620.4880.4970.0370.5450.1730.5310.0260.2770.0800.0430.0460.4880.4910.7190.1660.0500.0250.0920.0430.5200.1840.1080.0770.0440.0330.2171.0000.5030.0790.5310.000
veterans_benefits0.6440.0370.0540.0840.3890.0800.0760.0840.5040.6030.3870.3870.0250.7070.1020.6280.7070.3040.0720.0200.0190.3870.3870.4470.1260.0200.0160.1120.0070.4050.1250.0560.0690.0080.0720.1410.5031.0000.0560.3950.003
wage_per_hour0.0390.0050.0050.0170.0830.0050.0070.0070.0520.0360.0670.077-0.0000.0510.0230.0430.0000.0570.0090.0180.0080.0660.0670.0400.3510.0050.0080.0030.0080.2270.0200.0090.0110.0030.0390.0720.0790.0561.0000.2180.007
weeks_worked_in_year0.2690.1300.1050.0350.4480.0300.0280.0210.2760.2210.3050.3100.1520.2870.1840.2830.0210.3250.0260.0250.0410.3060.3040.1990.2210.0300.0400.0380.0410.8760.2400.0400.1130.0280.1170.2680.5310.3950.2181.0000.008
year0.0090.0060.0030.0120.0030.0300.0300.0240.0050.0060.0070.0060.0000.0090.0030.0060.0000.7930.0420.0300.9860.0070.0040.0000.0000.9810.9861.0000.2950.0310.0120.0500.0140.2950.0000.0150.0000.0030.0070.0081.000

Missing values

2025-01-19T18:37:51.344478image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-01-19T18:37:52.545900image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

ageclass_of_workerdetailed_industry_recodedetailed_occupation_recodeeducationwage_per_hourenroll_in_edu_inst_last_wkmarital_statmajor_industry_codemajor_occupation_coderacehispanic_originsexmember_of_a_labor_unionreason_for_unemploymentfull_or_part_time_employment_statcapital_gainscapital_lossesdividends_from_stockstax_filer_statregion_of_previous_residencestate_of_previous_residencedetailed_household_and_family_statdetailed_household_summary_in_householdinstance_weightmigration_code_change_in_msamigration_code_change_in_regmigration_code_move_within_reglive_in_this_house_1_year_agomigration_prev_res_in_sunbeltnum_persons_worked_for_employerfamily_members_under_18country_of_birth_fathercountry_of_birth_mothercountry_of_birth_selfcitizenshipown_business_or_self_employedfill_inc_questionnaire_for_veteran's_adminveterans_benefitsweeks_worked_in_yearyeartarget
073Not in universeNot in universe or childrenNot in universeHigh School Graduate0Not in universeWidowedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot Employed000Non-FilerNot in universeNot in universeExtended FamilyOther relative of householder1700.09Not in universeNot in universe?Not in universeNot in universe0Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran019951
158Self-employedManufacturing-durable goodsAutomobile mechanics and repairersSome College0Not in universeDivorcedConstructionPrecision production craft & repairWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces000Individual FilerSouthArkansasPrimary HouseholderHouseholder1053.55MSA movementSame areaSame countyNoYes1Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5219941
218Not in universeNot in universe or childrenNot in universeBelow High School0High schoolNever MarriedNot in universe or childrenNot in universeAsian or Pacific IslanderAll otherFemaleNot in universeNot in universeNot Employed000Non-FilerNot in universeNot in universeChildChild 18 or older991.95Not in universeNot in universe?Not in universeNot in universe0Not in universeVietnamVietnamVietnamForeignNot in universeNot in universeNot a Veteran019951
39Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married1758.14No movementSame areaNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe019941
410Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married1069.16No movementSame areaNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe019941
548Private sectorPersonal servicesTeachers, except college and universitySome College1200Not in universeMarriedEntertainmentProfessional specialtyAmer Indian Aleut or EskimoAll otherFemaleNoNot in universeFTE000Joint FilerNot in universeNot in universePrimary HouseholderSpouse of householder162.61Not in universeNot in universe?Not in universeNot in universe1Not in universePhilippinesUnited-StatesUnited-StatesNativeNoNot in universeNot a Veteran5219951
642Private sectorManufacturing-durablesManagement related occupationsCollege Graduate0Not in universeMarriedFinance insurance and real estateExecutive admin and managerialWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces517800Joint FilerNot in universeNot in universePrimary HouseholderHouseholder1535.86No movementSame areaNonmoverYesNot in universe6Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5219941
728Private sectorManufacturing-durable goodsFabricators, assemblers, and hand workingHigh School Graduate0Not in universeNever MarriedConstructionHandlers equip cleaners etcWhiteAll otherFemaleNot in universeJob loserFTE000Individual FilerNot in universeNot in universeOtherNonrelative of householder898.83Not in universeNot in universe?Not in universeNot in universe4Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran3019951
847GovernmentManufacturingFood service occupationsSome College876Not in universeMarriedEducationAdm support including clericalWhiteAll otherFemaleNoNot in universeFTE000Joint FilerNot in universeNot in universePrimary HouseholderSpouse of householder1661.53Not in universeNot in universe?Not in universeNot in universe5Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5219951
934Private sectorManufacturing-durable goodsExtractive occupationsSome College0Not in universeMarriedConstructionMachine operators assmblrs & inspctrsWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces000Joint FilerNot in universeNot in universePrimary HouseholderHouseholder1146.79No movementSame areaNonmoverYesNot in universe6Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5219941
ageclass_of_workerdetailed_industry_recodedetailed_occupation_recodeeducationwage_per_hourenroll_in_edu_inst_last_wkmarital_statmajor_industry_codemajor_occupation_coderacehispanic_originsexmember_of_a_labor_unionreason_for_unemploymentfull_or_part_time_employment_statcapital_gainscapital_lossesdividends_from_stockstax_filer_statregion_of_previous_residencestate_of_previous_residencedetailed_household_and_family_statdetailed_household_summary_in_householdinstance_weightmigration_code_change_in_msamigration_code_change_in_regmigration_code_move_within_reglive_in_this_house_1_year_agomigration_prev_res_in_sunbeltnum_persons_worked_for_employerfamily_members_under_18country_of_birth_fathercountry_of_birth_mothercountry_of_birth_selfcitizenshipown_business_or_self_employedfill_inc_questionnaire_for_veteran's_adminveterans_benefitsweeks_worked_in_yearyeartarget
19951357Private sectorWholesale tradeExtractive occupationsBelow High School0Not in universeDivorcedManufacturing-durable goodsMachine operators assmblrs & inspctrsWhiteCentral or South AmericanFemaleNot in universeNot in universeFTE000Individual FilerNot in universeNot in universePrimary HouseholderHouseholder743.66Not in universeNot in universe?Not in universeNot in universe4Not in universeDominican-RepublicDominican-RepublicDominican-RepublicForeignNot in universeNot in universeNot a Veteran5219951
19951451Private sectorPublic administrationComputer equipment operatorsBelow High School0Not in universeWidowedRetail tradeSalesWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Individual FilerSouthNorth DakotaPrimary HouseholderHouseholder1302.34Non-MSA movementSame areaSame countyNoYes6Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5219941
19951587Not in universeNot in universe or childrenNot in universeHigh School Graduate0Not in universeWidowedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot Employed000Individual FilerNot in universeNot in universePrimary HouseholderHouseholder3255.80Not in universeNot in universe?Not in universeNot in universe0Not in universe?United-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran019951
1995163Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeBlackAll otherMaleNot in universeNot in universeChildren or Armed Forces000Non-FilerSouthUtahChildNonrelative of householder2733.75MSA movementSame areaSame countyNoYes0Mother only presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe019941
19951739Private sectorManufacturingFood service occupationsCollege Graduate0Not in universeNever MarriedEducationAdm support including clericalOtherMexican-AmericanMaleNoNot in universeFTE684900Individual FilerNot in universeNot in universePrimary HouseholderHouseholder908.14Not in universeNot in universe?Not in universeNot in universe6Not in universeMexicoMexicoMexicoForeignNoNot in universeNot a Veteran5219951
19951887Not in universeNot in universe or childrenNot in universeBelow High School0Not in universeMarriedNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeNot Employed000Joint FilerNot in universeNot in universePrimary HouseholderHouseholder955.27Not in universeNot in universe?Not in universeNot in universe0Not in universeCanadaUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran019951
19951965Self-employedWholesale and retail tradeOther executive, admin and managerialBelow High School0Not in universeMarriedBusiness and repair servicesExecutive admin and managerialWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces641809Joint FilerNot in universeNot in universePrimary HouseholderHouseholder687.19No movementSame areaNonmoverYesNot in universe1Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5219941
19952047Not in universeNot in universe or childrenNot in universeSome College0Not in universeMarriedNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces00157Joint FilerNot in universeNot in universePrimary HouseholderHouseholder1923.03Not in universeNot in universe?Not in universeNot in universe6Not in universePolandPolandGermanyNaturalizedNot in universeNot in universeNot a Veteran5219951
19952116Not in universeNot in universe or childrenNot in universeBelow High School0High schoolNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot Employed000Non-FilerNot in universeNot in universeChildChild under 18 never married4664.87Not in universeNot in universe?Not in universeNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran019951
19952232Private sectorPublic administration and armed forcesFarm operators and managersHigh School Graduate0Not in universeNever MarriedMedical except hospitalOther serviceBlackAll otherFemaleNoNot in universeChildren or Armed Forces000Individual FilerNot in universeNot in universePrimary HouseholderHouseholder1830.11No movementSame areaNonmoverYesNot in universe6Not in universe???ForeignNot in universeNot in universeNot a Veteran5219941

Duplicate rows

Most frequently occurring

ageclass_of_workerdetailed_industry_recodedetailed_occupation_recodeeducationwage_per_hourenroll_in_edu_inst_last_wkmarital_statmajor_industry_codemajor_occupation_coderacehispanic_originsexmember_of_a_labor_unionreason_for_unemploymentfull_or_part_time_employment_statcapital_gainscapital_lossesdividends_from_stockstax_filer_statregion_of_previous_residencestate_of_previous_residencedetailed_household_and_family_statdetailed_household_summary_in_householdinstance_weightmigration_code_change_in_msamigration_code_change_in_regmigration_code_move_within_reglive_in_this_house_1_year_agomigration_prev_res_in_sunbeltnum_persons_worked_for_employerfamily_members_under_18country_of_birth_fathercountry_of_birth_mothercountry_of_birth_selfcitizenshipown_business_or_self_employedfill_inc_questionnaire_for_veteran's_adminveterans_benefitsweeks_worked_in_yearyeartarget# duplicates
1415Not in universeNot in universe or childrenNot in universeBelow High School0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married1217.42No movementSame areaNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran0199413
7917Not in universeNot in universe or childrenNot in universeBelow High School0High schoolNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married1724.96No movementSame areaNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran0199413
00Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeExtended FamilyOther relative of householder1706.01Not in universeNot in universe?Not in universeNot in universe0Mother only presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe0199512
11Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeBlackAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeOtherNonrelative of householder4118.09No movementSame areaNonmoverYesNot in universe0Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe0199412
21Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteMexican (Mexicano)FemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeExtended FamilyOther relative of householder1231.01No movementSame areaNonmoverYesNot in universe0Both parents presentMexicoMexicoUnited-StatesNativeNot in universeNot in universeNot in universe0199412
33Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeExtended FamilyOther relative of householder1875.27No movementSame areaNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe0199412
44Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeExtended FamilyOther relative of householder1332.16No movementSame areaNonmoverYesNot in universe0Neither parent presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe0199412
55Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeAsian or Pacific IslanderAll otherMaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeExtended FamilyOther relative of householder753.84Not in universeNot in universe?Not in universeNot in universe0Both parents presentPhilippinesPhilippinesUnited-StatesNativeNot in universeNot in universeNot in universe0199512
65Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeExtended FamilyOther relative of householder1175.34Not in universeNot in universe?Not in universeNot in universe0Neither parent presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe0199512
715Not in universeNot in universe or childrenNot in universeBelow High School0Not in universeNever MarriedNot in universe or childrenNot in universeAsian or Pacific IslanderAll otherFemaleNot in universeNot in universeNot Employed000Non-FilerNot in universeNot in universeChildChild under 18 never married709.34Not in universeNot in universe?Not in universeNot in universe0Both parents present???ForeignNot in universeNot in universeNot a Veteran0199512